Vibration data analysis and image analysis for robust action recognition in retail environment

ABSTRACT

A method for using vibration data analysis and image analysis for robust action recognition in a retail environment may include receiving vibration data captured using1 one or more vibration sensors mounted to a shelving unit including at least one retail shelf; receiving at least one image captured using at least one image sensor from a retail environment including the shelving unit; analyzing the vibration data and the at least one image to detect an action performed in the retail environment; and providing information based on the detected action.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of priority of U.S. ProvisionalApplication No. 63/113,490, filed Nov. 13, 2020. The foregoingapplication is incorporated herein by reference in its entirety.

BACKGROUND I. Technical Field

The present disclosure relates generally to systems and methods forderiving information from sensors in retail environment, and morespecifically to systems and methods for deriving information from image,infrared and vibration sensors in retail environment.

II. Background Information

Shopping in stores is a prevalent part of modern daily life. Storeowners(also known as “retailers”) stock a wide variety of products in retailstores and add associated labels and promotions in the retail stores.Managing and operating retail stores efficiently is an ongoing effortconsuming tremendous resources. Placing cameras in the retail stores,and using image analysis to determine information for enhancing andimproving retail stores operation and management is becoming prevalent.However, in large scale, image analysis is still expensive, and thelevel of details and accuracy of the information derived from the imageanalysis is still insufficient for many tasks.

The disclosed devices and methods are directed to providing new ways forderiving information in retail stores in an efficient manner.

SUMMARY

Embodiments consistent with the present disclosure provide methods,systems, and computer-readable media are provided for derivinginformation from sensors in retail environment. Some non-limitingexamples of such sensors may include image sensors, infrared sensors,vibration sensors, and so forth.

In some embodiments, methods, systems, and computer-readable media areprovided for triggering image processing based on infrared dataanalysis.

In some examples, first infrared input data captured using a first groupof one or more infrared sensors may be received. The first infraredinput data may be analyzed to detect an engagement of a person with aretail shelf. Second infrared input data captured using a second groupof one or more infrared sensors after the capturing of the firstinfrared input data may be received. The second infrared input data maybe analyzed to determine a completion of the engagement of the personwith the retail shelf. In one example, for example in response to thedetermined completion of the engagement of the person with the retailshelf, at least one image of the retail shelf captured using at leastone image sensor after the completion of the engagement of the personwith the retail shelf may be analyzed. The analysis of the at least oneimage may be used to determine a state of the retail shelf

In some examples, the first group of one or more infrared sensors may bea group of one or more passive infrared sensors. In some examples, thefirst group of one or more infrared sensors may be identical to thesecond group of one or more infrared sensors. In some examples, thefirst group of one or more infrared sensors may be a group of one ormore infrared sensors positioned below a second retail shelf, the secondretail shelf is positioned above the retail shelf.

In some examples, the determined state of the retail shelf may includean inventory data associated with products on the retail shelf after theengagement of the person with the retail shelf. In some examples, thedetermined state of the retail shelf may include facings data associatedwith products on the retail shelf after the engagement of the personwith the retail shelf. In some examples, the determined state of theretail shelf may include planogram compliance status associated with theretail shelf after the engagement of the person with the retail shelf.

In some examples, the analysis of the at least one image and an analysisof one or more images of the retail shelf captured using the at leastone image sensor before the engagement of the person with the retailshelf may be used to determine a change associated with the retail shelfduring the engagement of the person with the retail shelf.

In some examples, the at least one image sensor may be at least oneimage sensor mounted to a second retail shelf. In some examples, the atleast one image sensor may be at least one image sensor mounted to animage capturing robot.

In some examples, for example in response to the determined completionof the engagement of the person with the retail shelf, the capturing ofthe at least one image of the retail shelf using the at least one imagesensor may be triggered.

In some examples, the first infrared input data may be analyzed todetermine a type of the engagement of the person with the retail shelf.Further, in some examples, in response to a first determined type of theengagement, the analyzing the at least one image of the retail shelf maybe triggered, and in response to a second determined type of theengagement, analyzing the at least one image of the retail shelf may beforgone.

In some examples, the first infrared input data may be analyzed todetermine a type of the engagement of the person with the retail shelf.Further, in one example, in response to a first determined type of theengagement, a first analysis step may be included in the analysis of theat least one image of the retail shelf, and in response to a seconddetermined type of the engagement, a second analysis step may beincluded in the analysis of the at least one image of the retail shelf.The second analysis step may differ from the first analysis step.

In some examples, the determination of the completion of the engagementof the person with the retail shelf may be a determination that theperson cleared an environment of the retail shelf.

In some examples, a convolution of at least part of the first infraredinput data may be calculated. Further, in some examples, in response toa first value of the calculated convolution of the at least part of thefirst infrared input data, the engagement of a person with a retailshelf may be detected, and in response to a second value of thecalculated convolution of the at least part of the first infrared inputdata, detecting the engagement of a person with a retail shelf may beforgone.

In some examples, for example in response to the detected engagement ofa person with a retail shelf, one or more images of the retail shelfcaptured before the completion of the engagement of the person with theretail shelf may be analyzed to determine at least one aspect of theengagement. In one example, a virtual shopping cart associated with theperson may be updated based on the determined at least one aspect of theengagement. In one example, the analysis of the at least one image ofthe retail shelf captured after the completion of the engagement of theperson with the retail shelf and the determined at least one aspect ofthe engagement may be used to determine the state of the retail shelf.

In some embodiments, methods, systems, and computer-readable media areprovided for triggering image processing based on vibration dataanalysis.

In some examples, vibration data captured using one or more vibrationsensors mounted to a shelving unit including a plurality of retailshelves may be received. The vibration data may be analyzed to determinewhether a vibration is a result of an engagement of a person with atleast one retail shelf of the plurality of retail shelves. In oneexample, in response to a determination that the vibration is the resultof the engagement of the person with the at least one retail shelf ofthe plurality of retail shelves, analysis of at least one image of atleast part of the plurality of retail shelves captured after thebeginning of the engagement of the person with the at least one retailshelf of the plurality of retail shelves may be triggered, and inresponse to a determination that the vibration is not the result of theengagement of the person with the at least one retail shelf of theplurality of retail shelves, triggering the analysis of the at least oneimage may be forgone. In one example, information may be provided basedon a result of the analysis of the at least one image of the at leastpart of the plurality of retail shelves.

In some examples, the plurality of retail shelves may include at least afirst retail shelf and a second retail shelf. The vibration data may beanalyzed to determine that the vibration is a result of an engagementwith the first retail shelf of the plurality of retail shelves and not aresult of an engagement with the second retail shelf of the plurality ofretail shelves. In one example, for example in response to thedetermination that the vibration is a result of an engagement with thefirst retail shelf of the plurality of retail shelves and not a resultof an engagement with the second retail shelf of the plurality of retailshelves, including images depicting the second shelf in the at least oneimage may be avoided.

In some examples, the at least one image may be at least one image ofthe at least part of the plurality of retail shelves captured after acompletion of the engagement of the person with the at least one retailshelf. In one example, the vibration data may be analyzed to determinethe completion of the engagement of the person with the at least oneretail shelf. In one example, one or more images of the at least oneretail shelf may be analyzed to determine the completion of theengagement of the person with the at least one retail shelf. In oneexample, infrared data captured using at least one infrared sensor maybe analyzed to determine a completion of the engagement of the personwith the at least one retail shelf. In one example, the analysis of theat least one image of the at least part of the plurality of retailshelves may be used to determine a state of at least one retail shelfafter the completion of the engagement. For example, the determinedstate of the at least one retail shelf may include an inventory dataassociated with products on the at least one retail shelf after thecompletion of the engagement, the inventory data is determined using theanalysis of the at least one image. In another example, the determinedstate of the at least one retail shelf may include facings dataassociated with products on the at least one retail shelf after thecompletion of the engagement, the facings data is determined using theanalysis of the at least one image. In yet another example, thedetermined state of the at least one retail shelf may include planogramcompliance status of the at least one retail shelf after the completionof the engagement, and the planogram compliance status may be determinedusing the analysis of the at least one image. In an additional example,the analysis of the at least one image and an analysis of one or moreimages of the at least one retail shelf captured using the at least oneimage sensor before the engagement may be used to determine a changeassociated with the at least one retail shelf during the engagement.

In some examples, the at least one image may be captured using at leastone image sensor mounted to a retail shelf not included in the at leastone retail shelf. In some examples, the at least one image may becaptured using at least one image sensor mounted to an image capturingrobot. In some examples, the at least one image may be captured using atleast one image sensor mounted to a ceiling of a retail store. In someexamples, the at least one image may be captured using at least oneimage sensor included in a personal mobile device.

In some examples, for example, in response to the determination that thevibration is a result of the engagement of the person with the at leastone retail shelf, capturing of the at least one image of the at leastpart of the plurality of retail shelves may be triggered.

In some examples, the vibration data may be analyzed to determine a typeof the engagement of the person with the at least one retail shelf. Inone example, in response to a first determined type of the engagement, afirst analysis step may be included in the analysis of the at least oneimage of the at least part of the plurality of retail shelves, and inresponse to a second determined type of the engagement, a secondanalysis step may be included in the analysis of the at least one imageof the at least part of the plurality of retail shelves, the secondanalysis step differs from the first analysis step.

In some examples, the vibration data may be analyzed to determine a typeof the engagement of the person with the at least one retail shelf. Inone example, in response to a first determined type of the engagement,the analysis of the at least one image of the at least part of theplurality of retail shelves may be triggered, and in response to asecond determined type of the engagement, triggering the analysis of theat least one image of the at least part of the plurality of retailshelves may be forgone.

In some embodiments, methods, systems, and computer-readable media areprovided for forgoing image processing in response to infrared dataanalysis.

In some examples, infrared input data captured using one or moreinfrared sensors may be received. The infrared input data may beanalyzed to detect a presence of an object in an environment of a retailshelf. In one example, in response to no detected presence of an objectin the environment of the retail unit, at least one image of the retailshelf captured using at least one image sensor may be analyzed, and inresponse to a detection of presence of an object in the environment ofthe retail unit, analyzing the at least one image of the retail shelfcaptured using the at least one image sensor may be forgone.

In some examples, the at least one image sensor may be at least oneimage sensor mounted to a second retail shelf. In some examples, the atleast one image sensor may be at least one image sensor mounted to animage capturing robot. In some examples, the at least one image sensormay be at least one image sensor mounted to a ceiling of a retail store.In some examples, the at least one image sensor may be a part of apersonal mobile device.

In some examples, the analysis of the at least one image may be used todetermine a state of the retail shelf. In some examples, the environmentof the retail shelf may include an area between the at least one imagesensor and at least part of the retail shelf. In some examples, the oneor more infrared sensors may be one or more infrared sensors physicallycoupled with the at least one image sensor. In some examples, the one ormore infrared sensors may be one or more passive infrared sensors. Insome examples, the object may be at least one of a person, a robot, andan inanimate object.

In some examples, the infrared input data may be analyzed to determine aportion of a field of view of the at least one image sensor associatedwith the object. In one example, in response to a first determinedportion of the field of view of the at least one image sensor associatedwith the object, the at least one image of the retail shelf capturedusing the at least one image sensor, and in response to a seconddetermined portion of the field of view of the at least one image sensorassociated with the object, analyzing the at least one image of theretail shelf captured using the at least one image sensor may beforgone. In one example, the field of view of the at least one imagesensor may differ from the field of view of the one or more infraredsensors.

In some examples, the infrared input data may be analyzed to determine atype of the object. In one example, in response to a first determinedtype of the object, the at least one image of the retail shelf capturedusing the at least one image sensor may be analyzed, and in response toa second determined type of the object, analyzing the at least one imageof the retail shelf captured using the at least one image sensor may beforgone.

In some examples, the infrared input data may be analyzed to determine aduration associated with the presence of an object in the environment ofthe retail shelf. The determined duration may be compared with athreshold. In one example, in response to a first result of thecomparison, the at least one image of the retail shelf captured usingthe at least one image sensor may be analyzed, and in response to asecond result of the comparison, analyzing the at least one image of theretail shelf captured using the at least one image sensor may beforgone. In one example, the threshold may be selected based on at leastone product type associated with the retail shelf. In one example, thethreshold may be selected based on a status of the retail shelfdetermined using image analysis of one or more images of the retailshelf captured using the at least one image sensor before the capturingof the infrared input data. In one example, the threshold may beselected based on a time of day.

In some examples, in response to no detected presence of an object inthe environment of the retail unit, the at least one image of the retailshelf using the at least one image sensor may be captured, and inresponse to a detection of presence of an object in the environment ofthe retail unit, the capturing of the at least one image of the retailshelf may be forgone.

In some embodiments, methods, systems, and computer-readable media areprovided for robust action recognition in retail environment.

In some examples, infrared data captured using one or more infraredsensors from a retail environment may be received. Further, at least oneimage captured using at least one image sensor from the retailenvironment may be received. The infrared data and the at least oneimage may be analyzed to detect an action performed in the retailenvironment. In one example, information based on the detected actionmay be provided.

In some examples, the action may include at least one of picking aproduct from a retail shelf, placing a product on a retail shelf andmoving a product on a retail shelf. In some examples, detecting theaction performed in the retail environment may include recognizing atype of the action. In some examples, detecting the action performed inthe retail environment may include at least one of identifying a producttype associated with the action and determining a quantity of productsassociated with the action. In some examples, the at least one image mayinclude at least one three-dimensional image.

In some examples, a convolution of at least part of the at least oneimage may be calculated to obtain a value of the calculated convolution.Further, the value of the calculated convolution may be used to analyzethe infrared data to detect the action performed in the retailenvironment.

In some examples, a convolution of at least part of the infrared datamay be calculated to obtain a value of the calculated convolution.Further, the value of the calculated convolution may be used to analyzethe at least one image to detect the action performed in the retailenvironment.

In some examples, a convolution of at least part of the at least oneimage may be calculated to obtain a value of the calculated convolution.Further, the infrared data may be analyzed to determine a wavelengthassociated with the infrared data. In one example, in response to afirst combination of the value of the calculated convolution and thewavelength associated with the infrared data, the action performed inthe retail environment may be detected, and in response to a secondcombination of the value of the calculated convolution and thewavelength associated with the infrared data, the detection of theaction performed in the retail environment may be forgone.

In some examples, the infrared data may include a time series of samplescaptured using the one or more infrared sensors at different points intime. In one example, the time series of samples may be analyzed toselect the at least one image of a plurality of images. In one example,two samples of the time series of samples may be compared to oneanother, and a result of the comparison may be used to analyze the atleast one image to detect the action performed in the retailenvironment.

In some examples, the at least one image may include a plurality offrames of a video captured using the at least one image sensor. In oneexample, two frames of the plurality of frames may be compared to oneanother, and a result of the comparison may be used to analyze theinfrared data to detect the action performed in the retail environment.

In some examples, the infrared data may be analyzed to select a portionof the at least one image, and the selected portion of the at least oneimage may be analyzed to detect the action performed in the retailenvironment.

In some examples, the infrared data may be analyzed to attempt to detectthe action performed in the retail environment, and in response to afailure of the attempt to successfully detect the action, the at leastone image may be analyzed to detect the action performed in the retailenvironment. In one example, the failure to successfully detect theaction may be a failure to successfully detect the action at aconfidence level higher than a selected threshold. In another example,the failure to successfully detect the action may be a failure todetermine at least one aspect of the action. In yet another example, inresponse to a failure to successfully detect the action, the capturingof the at least one image using the at least one image sensor may betriggered.

In some embodiments, methods, systems, and computer-readable media areprovided for using vibration data analysis and image analysis for robustaction recognition in retail environment.

In some examples, vibration data captured using one or more vibrationsensors mounted to a shelving unit including at least one retail shelfmay be received. Further, at least one image captured using at least oneimage sensor from a retail environment including the shelving unit maybe received. The vibration data and the at least one image may beanalyzed to detect an action performed in the retail environment. In oneexample, information based on the detected action may be provided.

In some examples, the action may include at least one of picking aproduct from a retail shelf, placing a product on a retail shelf andmoving a product on a retail shelf. In some examples, detecting theaction performed in the retail environment may include recognizing atype of the action. In some examples, detecting the action performed inthe retail environment may include at least one of identifying a producttype associated with the action and determining a quantity of productsassociated with the action. In some examples, the at least one image mayinclude at least one three-dimensional image.

In some examples, a convolution of at least part of the at least oneimage may be calculated to obtain a value of the calculated convolution.Further, the value of the calculated convolution may be used to analyzethe vibration data to detect the action performed in the retailenvironment.

In some examples, a convolution of at least part of the vibration datato obtain a value of the calculated convolution may be calculated.Further, the value of the calculated convolution may be used to analyzethe at least one image to detect the action performed in the retailenvironment.

In some examples, a convolution of at least part of the at least oneimage to obtain a value of the calculated convolution may be calculated.Further, the vibration data may be analyzed to determine a frequencyassociated with the vibration data. In one example, in response to afirst combination of the value of the calculated convolution and thefrequency associated with the vibration data, the action performed inthe retail environment may be detected, and in response to a secondcombination of the value of the calculated convolution and the frequencyassociated with the vibration data, the detection of the actionperformed in the retail environment may be forgone.

In some examples, the vibration data may include a time series ofsamples captured using the one or more vibration sensors at differentpoints in time. For example, the time series of samples may be analyzedto select the at least one image of a plurality of images. In anotherexample, two samples of the time series of samples may be compared toone another, and a result of the comparison may be used to analyze theat least one image to detect the action performed in the retailenvironment.

In some examples, the at least one image may include a plurality offrames of a video captured using the at least one image sensor. In oneexample, two frames of the plurality of frames may be compared to oneanother, and a result of the comparison may be used to analyze thevibration data to detect the action performed in the retail environment.

In some examples, the vibration data may be analyzed to select a portionof the at least one image, and the selected portion of the at least oneimage may be analyzed to detect the action performed in the retailenvironment.

In some examples, the vibration data may be analyzed to attempt todetect the action performed in the retail environment, and in responseto a failure of the attempt to successfully detect the action, the atleast one image may be analyzed to detect the action performed in theretail environment. In one example, the failure to successfully detectthe action may be a failure to successfully detect the action at aconfidence level higher than a selected threshold. In another example,the failure to successfully detect the action may be a failure todetermine at least one aspect of the action. In one example, forexample, in response to a failure to successfully detect the action, thecapturing of the at least one image using the at least one image sensormay be triggered.

Consistent with other disclosed embodiments, non-transitorycomputer-readable medium including instructions that when executed by aprocessor may cause the processor to perform any of the methodsdescribed herein.

The foregoing general description and the following detailed descriptionare exemplary and explanatory only and are not restrictive of theclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this disclosure, illustrate various disclosed embodiments. Inthe drawings:

FIG. 1 is an illustration of an exemplary system for analyzinginformation collected from a retail store.

FIG. 2 is a block diagram that illustrates some of the components of animage processing system, consistent with the present disclosure.

FIG. 3 is a block diagram that illustrates an exemplary embodiment of acapturing device, consistent with the present disclosure.

FIG. 4A is a schematic illustration of an example configuration forcapturing image data in a retail store, consistent with the presentdisclosure.

FIG. 4B is a schematic illustration of another example configuration forcapturing image data in a retail store, consistent with the presentdisclosure.

FIG. 4C is a schematic illustration of another example configuration forcapturing image data in a retail store, consistent with the presentdisclosure.

FIG. 5A is an illustration of an example system for acquiring images ofproducts in a retail store, consistent with the present disclosure.

FIG. 5B is an illustration of a shelf-mounted camera unit included in afirst housing of the example system of FIG. 5A, consistent with thepresent disclosure.

FIG. 5C is an exploded view illustration of a processing unit includedin a second housing of the example system of FIG. 5A, consistent withthe present disclosure.

FIG. 6A is a top view representation of an aisle in a retail store withmultiple image acquisition systems deployed thereon for acquiring imagesof products, consistent with the present disclosure.

FIG. 6B is a perspective view representation of part of a retailshelving unit with multiple image acquisition systems deployed thereonfor acquiring images of products, consistent with the presentdisclosure.

FIG. 6C provides a diagrammatic representation of how the exemplarydisclosed image acquisition systems may be positioned relative to retailshelving to acquire product images, consistent with the presentdisclosure.

FIG. 7A provides a flowchart of an exemplary method for acquiring imagesof products in retail store, consistent with the present disclosure.

FIG. 7B provides a flowchart of a method for acquiring images ofproducts in retail store, consistent with the present disclosure.

FIG. 8A is a schematic illustration of an example configuration fordetecting products and empty spaces on a store shelf, consistent withthe present disclosure.

FIG. 8B is a schematic illustration of another example configuration fordetecting products and empty spaces on a store shelf, consistent withthe present disclosure.

FIG. 9 is a schematic illustration of example configurations fordetection elements on store shelves, consistent with the presentdisclosure.

FIG. 10A illustrates an exemplary method for monitoring planogramcompliance on a store shelf, consistent with the present disclosure.

FIG. 10B is illustrates an exemplary method for triggering imageacquisition based on product events on a store shelf, consistent withthe present disclosure.

FIG. 11A is a schematic illustration of an example output for a marketresearch entity associated with the retail store, consistent with thepresent disclosure.

FIG. 11B is a schematic illustration of an example output for a supplierof the retail store, consistent with the present disclosure.

FIG. 11C is a schematic illustration of an example output for a managerof the retail store, consistent with the present disclosure.

FIG. 11D is a schematic illustration of two examples outputs for anemployee of the retail store, consistent with the present disclosure.

FIG. 11E is a schematic illustration of an example output for an onlinecustomer of the retail store, consistent with the present disclosure.

FIG. 12 provides a flowchart of an exemplary method for triggering imageprocessing based on infrared data analysis, consistent with the presentdisclosure.

FIG. 13 provides a flowchart of an exemplary method for triggering imageprocessing based on vibration data analysis, consistent with the presentdisclosure.

FIG. 14 provides a flowchart of an exemplary method for forgoing imageprocessing in response to infrared data analysis, consistent with thepresent disclosure.

FIG. 15 provides a flowchart of an exemplary method for using infrareddata analysis and image analysis for robust action recognition in retailenvironment, consistent with the present disclosure.

FIG. 16 provides a flowchart of an exemplary method for using vibrationdata analysis and image analysis for robust action recognition in retailenvironment, consistent with the present disclosure.

DETAILED DESCRIPTION

The following detailed description refers to the accompanying drawings.Wherever possible, the same reference numbers are used in the drawingsand the following description to refer to the same or similar parts.While several illustrative embodiments are described herein,modifications, adaptations and other implementations are possible. Forexample, substitutions, additions, or modifications may be made to thecomponents illustrated in the drawings, and the illustrative methodsdescribed herein may be modified by substituting, reordering, removing,or adding steps to the disclosed methods. Accordingly, the followingdetailed description is not limited to the disclosed embodiments andexamples. Instead, the proper scope is defined by the appended claims.

The present disclosure is directed to systems and methods for processingimages captured in a retail store. As used herein, the term “retailstore” or simply “store” refers to an establishment offering productsfor sale by direct selection by customers physically or virtuallyshopping within the establishment. The retail store may be anestablishment operated by a single retailer (e.g., supermarket) or anestablishment that includes stores operated by multiple retailers (e.g.,a shopping mall). Embodiments of the present disclosure includereceiving an image depicting a store shelf having at least one productdisplayed thereon. As used herein, the term “store shelf” or simply“shelf” refers to any suitable physical structure which may be used fordisplaying products in a retail environment. In one embodiment, thestore shelf may be part of a shelving unit including a number ofindividual store shelves. In another embodiment, the store shelf mayinclude a display unit having a single-level or multi-level surfaces.

Consistent with the present disclosure, the system may process imagesand image data acquired by a capturing device to determine informationassociated with products displayed in the retail store. The term“capturing device” refers to any device configured to acquire image datarepresentative of products displayed in the retail store. Examples ofcapturing devices may include a digital camera, a time-of-flight camera,a stereo camera, an active stereo camera, a depth camera, a Lidarsystem, a laser scanner, CCD based devices, or any other sensor basedsystem capable of converting received light into electric signals. Theterm “image data” refers to any form of data generated based on opticalsignals in the near-infrared, infrared, visible, and ultravioletspectrums (or any other suitable radiation frequency range). Consistentwith the present disclosure, the image data may include pixel datastreams, digital images, digital video streams, data derived fromcaptured images, and data that may be used to construct a 3D image. Theimage data acquired by a capturing device may be transmitted by wired orwireless transmission to a remote server. In one embodiment, thecapturing device may include a stationary camera with communicationlayers (e.g., a dedicated camera fixed to a store shelf, a securitycamera, and so forth). Such an embodiment is described in greater detailbelow with reference to FIG. 4A. In another embodiment, the capturingdevice may include a handheld device (e.g., a smartphone, a tablet, amobile station, a personal digital assistant, a laptop, and more) or awearable device (e.g., smart glasses, a smartwatch, a clip-on camera).Such an embodiment is described in greater detail below with referenceto FIG. 4B. In another embodiment, the capturing device may include arobotic device with one or more cameras operated remotely orautonomously (e.g., an autonomous robotic device, a drone, a robot on atrack, and more). Such an embodiment is described in greater detailbelow with reference to FIG. 4C.

In some embodiments, the capturing device may include one or more imagesensors. The term “image sensor” refers to a device capable of detectingand converting optical signals in the near-infrared, infrared, visible,and ultraviolet spectrums into electrical signals. The electricalsignals may be used to form image data (e.g., an image or a videostream) based on the detected signal. Examples of image sensors mayinclude semiconductor charge-coupled devices (CCD), active pixel sensorsin complementary metal-oxide-semiconductor (CMOS), or N-typemetal-oxide-semiconductors (NMOS, Live MOS). In some cases, the imagesensor may be part of a camera included in the capturing device.

Embodiments of the present disclosure further include analyzing imagesto detect and identify different products. As used herein, the term“detecting a product” may broadly refer to determining an existence ofthe product. For example, the system may determine the existence of aplurality of distinct products displayed on a store shelf. By detectingthe plurality of products, the system may acquire different detailsrelative to the plurality of products (e.g., how many products on astore shelf are associated with a same product type), but it does notnecessarily gain knowledge of the type of product. In contrast, the term“identifying a product” may refer to determining a unique identifierassociated with a specific type of product that allows inventorymanagers to uniquely refer to each product type in a product catalogue.Additionally or alternatively, the term “identifying a product” mayrefer to determining a unique identifier associated with a specificbrand of products that allows inventory managers to uniquely refer toproducts, e.g., based on a specific brand in a product catalogue.Additionally or alternatively, the term “identifying a product” mayrefer to determining a unique identifier associated with a specificcategory of products that allows inventory managers to uniquely refer toproducts, e.g., based on a specific category in a product catalogue. Insome embodiments, the identification may be made based at least in parton visual characteristics of the product (e.g., size, shape, logo, text,color, and so forth). The unique identifier may include any codes thatmay be used to search a catalog, such as a series of digits, letters,symbols, or any combinations of digits, letters, and symbols. Consistentwith the present disclosure, the terms “determining a type of a product”and “determining a product type” may also be used interchangeably inthis disclosure with reference to the term “identifying a product.”

Embodiments of the present disclosure further include determining atleast one characteristic of the product for determining the type of theproduct. As used herein, the term “characteristic of the product” refersto one or more visually discernable features attributed to the product.Consistent with the present disclosure, the characteristic of theproduct may assist in classifying and identifying the product. Forexample, the characteristic of the product may be associated with theornamental design of the product, the size of the product, the shape ofthe product, the colors of the product, the brand of the product, a logoor text associated with the product (e.g., on a product label), andmore. In addition, embodiments of the present disclosure further includedetermining a confidence level associated with the determined type ofthe product. The term “confidence level” refers to any indication,numeric or otherwise, of a level (e.g., within a predetermined range)indicative of an amount of confidence the system has that the determinedtype of the product is the actual type of the product. For example, theconfidence level may have a value between 1 and 10, alternatively, theconfidence level may be expressed as a percentage.

In some cases, the system may compare the confidence level to athreshold. The term “threshold” as used herein denotes a referencevalue, a level, a point, or a range of values, for which, when theconfidence level is above it (or below it depending on a particular usecase), the system may follow a first course of action and, when theconfidence level is below it (or above it depending on a particular usecase), the system may follow a second course of action. The value of thethreshold may be predetermined for each type of product or may bedynamically selected based on different considerations. In oneembodiment, when the confidence level associated with a certain productis below a threshold, the system may obtain contextual information toincrease the confidence level. As used herein, the term “contextualinformation” (or “context”) refers to any information having a direct orindirect relationship with a product displayed on a store shelf. In someembodiments, the system may retrieve different types of contextualinformation from captured image data and/or from other data sources. Insome cases, contextual information may include recognized types ofproducts adjacent to the product under examination. In other cases,contextual information may include text appearing on the product,especially where that text may be recognized (e.g., via OCR) andassociated with a particular meaning. Other examples of types ofcontextual information may include logos appearing on the product, alocation of the product in the retail store, a brand name of theproduct, a price of the product, product information collected frommultiple retail stores, product information retrieved from a catalogassociated with a retail store, etc.

Reference is now made to FIG. 1, which shows an example of a system 100for analyzing information collected from retail stores 105 (for example,retail store 105A, retail store 105B, and retail store 105C). In oneembodiment, system 100 may represent a computer-based system that mayinclude computer system components, desktop computers, workstations,tablets, handheld computing devices, memory devices, and/or internalnetwork(s) connecting the components. System 100 may include or beconnected to various network computing resources (e.g., servers,routers, switches, network connections, storage devices, etc.) necessaryto support the services provided by system 100. In one embodiment,system 100 may enable identification of products in retail stores 105based on analysis of captured images. In another embodiment, system 100may enable a supply of information based on analysis of captured imagesto a market research entity 110 and to different suppliers 115 of theidentified products in retail stores 105 (for example, supplier 115A,supplier 115B, and supplier 115C). In another embodiment, system 100 maycommunicate with a user 120 (sometimes referred to herein as a customer,but which may include individuals associated with a retail environmentother than customers, such as store employee, data collection agent,etc.) about different products in retail stores 105. In one example,system 100 may receive images of products captured by user 120. Inanother example, system 100 may provide to user 120 informationdetermined based on automatic machine analysis of images captured by oneor more capturing devices 125 associated with retail stores 105.

System 100 may also include an image processing unit 130 to execute theanalysis of images captured by the one or more capturing devices 125.Image processing unit 130 may include a server 135 operatively connectedto a database 140. Image processing unit 130 may include one or moreservers connected by a communication network, a cloud platform, and soforth. Consistent with the present disclosure, image processing unit 130may receive raw or processed data from capturing device 125 viarespective communication links, and provide information to differentsystem components using a network 150. Specifically, image processingunit 130 may use any suitable image analysis technique including, forexample, object recognition, object detection, image segmentation,feature extraction, optical character recognition (OCR), object-basedimage analysis, shape region techniques, edge detection techniques,pixel-based detection, artificial neural networks, convolutional neuralnetworks, etc. In addition, image processing unit 130 may useclassification algorithms to distinguish between the different productsin the retail store. In some embodiments, image processing unit 130 mayutilize suitably trained machine learning algorithms and models toperform the product identification. Network 150 may facilitatecommunications and data exchange between different system componentswhen these components are coupled to network 150 to enable output ofdata derived from the images captured by the one or more capturingdevices 125. In some examples, the types of outputs that imageprocessing unit 130 can generate may include identification of products,indicators of product quantity, indicators of planogram compliance,indicators of service-improvement events (e.g., a cleaning event, arestocking event, a rearrangement event, etc.), and various reportsindicative of the performances of retail stores 105. Additional examplesof the different outputs enabled by image processing unit 130 aredescribed below with reference to FIGS. 11A-11E and throughout thedisclosure.

Consistent with the present disclosure, network 150 may be any type ofnetwork (including infrastructure) that provides communications,exchanges information, and/or facilitates the exchange of informationbetween the components of system 100. For example, network 150 mayinclude or be part of the Internet, a Local Area Network, wirelessnetwork (e.g., a Wi-Fi/302.11 network), or other suitable connections.In other embodiments, one or more components of system 100 maycommunicate directly through dedicated communication links, such as, forexample, a telephone network, an extranet, an intranet, the Internet,satellite communications, off-line communications, wirelesscommunications, transponder communications, a local area network (LAN),a wide area network (WAN), a virtual private network (VPN), and soforth.

In one example configuration, server 135 may be a cloud server thatprocesses images received directly (or indirectly) from one or morecapturing device 125 and processes the images to detect and/or identifyat least some of the plurality of products in the image based on visualcharacteristics of the plurality of products. The term “cloud server”refers to a computer platform that provides services via a network, suchas the Internet. In this example configuration, server 135 may usevirtual machines that may not correspond to individual hardware. Forexample, computational and/or storage capabilities may be implemented byallocating appropriate portions of desirable computation/storage powerfrom a scalable repository, such as a data center or a distributedcomputing environment. In one example, server 135 may implement themethods described herein using customized hard-wired logic, one or moreApplication Specific Integrated Circuits (ASICs) or Field ProgrammableGate Arrays (FPGAs), firmware, and/or program logic which, incombination with the computer system, cause server 135 to be aspecial-purpose machine.

In another example configuration, server 135 may be part of a systemassociated with a retail store that communicates with capturing device125 using a wireless local area network (WLAN) and may provide similarfunctionality as a cloud server. In this example configuration, server135 may communicate with an associated cloud server (not shown) andcloud database (not shown). The communications between the store serverand the cloud server may be used in a quality enforcement process, forupgrading the recognition engine and the software from time to time, forextracting information from the store level to other data users, and soforth. Consistent with another embodiment, the communications betweenthe store server and the cloud server may be discontinuous (purposely orunintentional) and the store server may be configured to operateindependently from the cloud server. For example, the store server maybe configured to generate a record indicative of changes in productplacement that occurred when there was a limited connection (or noconnection) between the store server and the cloud server, and toforward the record to the cloud server once connection is reestablished.

As depicted in FIG. 1, server 135 may be coupled to one or more physicalor virtual storage devices such as database 140. Server 135 may accessdatabase 140 to detect and/or identify products. The detection may occurthrough analysis of features in the image using an algorithm and storeddata. The identification may occur through analysis of product featuresin the image according to stored product models. Consistent with thepresent embodiment, the term “product model” refers to any type ofalgorithm or stored product data that a processor may access or executeto enable the identification of a particular product associated with theproduct model. For example, the product model may include a descriptionof visual and contextual properties of the particular product (e.g., theshape, the size, the colors, the texture, the brand name, the price, thelogo, text appearing on the particular product, the shelf associatedwith the particular product, adjacent products in a planogram, thelocation within the retail store, and so forth). In some embodiments, asingle product model may be used by server 135 to identify more than onetype of products, such as, when two or more product models are used incombination to enable identification of a product. For example, in somecases, a first product model may be used by server 135 to identify aproduct category (such models may apply to multiple product types, e.g.,shampoo, soft drinks, etc.), and a second product model may be used byserver 135 to identify the product type, product identity, or othercharacteristics associated with a product. In some cases, such productmodels may be applied together (e.g., in series, in parallel, in acascade fashion, in a decision tree fashion, etc.) to reach a productidentification. In other embodiments, a single product model may be usedby server 135 to identify a particular product type (e.g., 6-pack of 16oz Coca-Cola Zero).

Database 140 may be included on a volatile or non-volatile, magnetic,semiconductor, tape, optical, removable, non-removable, or other type ofstorage device or tangible or non-transitory computer-readable medium.Database 140 may also be part of server 135 or separate from server 135.When database 140 is not part of server 135, server 135 may exchangedata with database 140 via a communication link. Database 140 mayinclude one or more memory devices that store data and instructions usedto perform one or more features of the disclosed embodiments. In oneembodiment, database 140 may include any suitable databases, rangingfrom small databases hosted on a workstation to large databasesdistributed among data centers. Database 140 may also include anycombination of one or more databases controlled by memory controllerdevices (e.g., server(s), etc.) or software. For example, database 140may include document management systems, Microsoft SQL databases,SharePoint databases, Oracle™ databases, Sybase™ databases, otherrelational databases, or non-relational databases, such as mongo andothers.

Consistent with the present disclosure, image processing unit 130 maycommunicate with output devices 145 to present information derived basedon processing of image data acquired by capturing devices 125. The term“output device” is intended to include all possible types of devicescapable of outputting information from server 135 to users or othercomputer systems (e.g., a display screen, a speaker, a desktop computer,a laptop computer, mobile device, tablet, a PDA, etc.), such as 145A,145B, 145C and 145D. In one embodiment, each of the different systemcomponents (i.e., retail stores 105, market research entity 110,suppliers 115, and users 120) may be associated with an output device145, and each system component may be configured to present differentinformation on the output device 145. In one example, server 135 mayanalyze acquired images including representations of shelf spaces. Basedon this analysis, server 135 may compare shelf spaces associated withdifferent products, and output device 145A may present market researchentity 110 with information about the shelf spaces associated withdifferent products. The shelf spaces may also be compared with salesdata, expired products data, and more. Consistent with the presentdisclosure, market research entity 110 may be a part of (or may workwith) supplier 115. In another example, server 135 may determine productcompliance to a predetermined planogram, and output device 145B maypresent to supplier 115 information about the level of productcompliance at one or more retail stores 105 (for example in a specificretail store 105, in a group of retail stores 105 associated withsupplier 115, in all retail stores 105, and so forth). The predeterminedplanogram may be associated with contractual obligations and/or otherpreferences related to the retailer methodology for placement ofproducts on the store shelves. In another example, server 135 maydetermine that a specific store shelf has a type of fault in the productplacement, and output device 145C may present to a manager of retailstore 105 a user-notification that may include information about acorrect display location of a misplaced product, information about astore shelf associated with the misplaced product, information about atype of the misplaced product, and/or a visual depiction of themisplaced product. In another example, server 135 may identify whichproducts are available on the shelf and output device 145D may presentto user 120 an updated list of products.

The components and arrangements shown in FIG. 1 are not intended tolimit the disclosed embodiments, as the system components used toimplement the disclosed processes and features may vary. In oneembodiment, system 100 may include multiple servers 135, and each server135 may host a certain type of service. For example, a first server mayprocess images received from capturing devices 125 to identify at leastsome of the plurality of products in the image, and a second server maydetermine from the identified products in retail stores 105 compliancewith contractual obligations between retail stores 105 and suppliers115. In another embodiment, system 100 may include multiple servers 135,a first type of servers 135 that may process information from specificcapturing devices 125 (e.g., handheld devices of data collection agents)or from specific retail stores 105 (e.g., a server dedicated to aspecific retail store 105 may be placed in or near the store). System100 may further include a second type of servers 135 that collect andprocess information from the first type of servers 135.

FIG. 2 is a block diagram representative of an example configuration ofserver 135. In one embodiment, server 135 may include a bus 200 (or anyother communication mechanism) that interconnects subsystems andcomponents for transferring information within server 135. For example,bus 200 may interconnect a processing device 202, a memory interface204, a network interface 206, and a peripherals interface 208 connectedto an I/O system 210.

Processing device 202, shown in FIG. 2, may include at least oneprocessor configured to execute computer programs, applications,methods, processes, or other software to execute particular instructionsassociated with embodiments described in the present disclosure. Theterm “processing device” refers to any physical device having anelectric circuit that performs a logic operation. For example,processing device 202 may include one or more processors, integratedcircuits, microchips, microcontrollers, microprocessors, all or part ofa central processing unit (CPU), graphics processing unit (GPU), digitalsignal processor (DSP), field programmable gate array (FPGA), or othercircuits suitable for executing instructions or performing logicoperations. Processing device 202 may include at least one processorconfigured to perform functions of the disclosed methods such as amicroprocessor manufactured by Intel™, Nvidia™, manufactured by AMD™,and so forth. Processing device 202 may include a single core ormultiple core processors executing parallel processes simultaneously. Inone example, processing device 202 may be a single core processorconfigured with virtual processing technologies. Processing device 202may implement virtual machine technologies or other technologies toprovide the ability to execute, control, run, manipulate, store, etc.,multiple software processes, applications, programs, etc. In anotherexample, processing device 202 may include a multiple-core processorarrangement (e.g., dual, quad core, etc.) configured to provide parallelprocessing functionalities to allow a device associated with processingdevice 202 to execute multiple processes simultaneously. It isappreciated that other types of processor arrangements could beimplemented to provide the capabilities disclosed herein.

Consistent with the present disclosure, the methods and processesdisclosed herein may be performed by server 135 as a result ofprocessing device 202 executing one or more sequences of one or moreinstructions contained in a non-transitory computer-readable storagemedium. As used herein, a non-transitory computer-readable storagemedium refers to any type of physical memory on which information ordata readable by at least one processor can be stored. Examples includerandom access memory (RAM), read-only memory (ROM), volatile memory,nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, anyother optical data storage medium, any physical medium with patterns ofholes, a RAM, a PROM, an EPROM, a FLASH-EPROM or any other flash memory,NVRAM, a cache, a register, any other memory chip or cartridge, andnetworked versions of the same. The terms “memory” and“computer-readable storage medium” may refer to multiple structures,such as a plurality of memories or computer-readable storage mediumslocated within server 135, or at a remote location. Additionally, one ormore computer-readable storage mediums can be utilized in implementing acomputer-implemented method. The term “computer-readable storage medium”should be understood to include tangible items and exclude carrier wavesand transient signals.

According to one embodiment, server 135 may include network interface206 (which may also be any communications interface) coupled to bus 200.Network interface 206 may provide one-way or two-way data communicationto a local network, such as network 150. Network interface 206 mayinclude an integrated services digital network (ISDN) card, cable modem,satellite modem, or a modem to provide a data communication connectionto a corresponding type of telephone line. As another example, networkinterface 206 may include a local area network (LAN) card to provide adata communication connection to a compatible LAN. In anotherembodiment, network interface 206 may include an Ethernet port connectedto radio frequency receivers and transmitters and/or optical (e.g.,infrared) receivers and transmitters. The specific design andimplementation of network interface 206 depends on the communicationsnetwork(s) over which server 135 is intended to operate. As describedabove, server 135 may be a cloud server or a local server associatedwith retail store 105. In any such implementation, network interface 206may be configured to send and receive electrical, electromagnetic, oroptical signals, through wires or wirelessly, that may carry analog ordigital data streams representing various types of information. Inanother example, the implementation of network interface 206 may besimilar or identical to the implementation described below for networkinterface 306.

Server 135 may also include peripherals interface 208 coupled to bus200. Peripherals interface 208 may be connected to sensors, devices, andsubsystems to facilitate multiple functionalities. In one embodiment,peripherals interface 208 may be connected to I/O system 210 configuredto receive signals or input from devices and provide signals or outputto one or more devices that allow data to be received and/or transmittedby server 135. In one embodiment I/O system 210 may include or beassociated with output device 145. For example, I/O system 210 mayinclude a touch screen controller 212, an audio controller 214, and/orother input controller(s) 216. Touch screen controller 212 may becoupled to a touch screen 218. Touch screen 218 and touch screencontroller 212 can, for example, detect contact, movement, or breakthereof using any of a plurality of touch sensitivity technologies,including but not limited to capacitive, resistive, infrared, andsurface acoustic wave technologies as well as other proximity sensorarrays or other elements for determining one or more points of contactwith touch screen 218. Touch screen 218 may also, for example, be usedto implement virtual or soft buttons and/or a keyboard. In addition toor instead of touch screen 218, I/O system 210 may include a displayscreen (e.g., CRT, LCD, etc.), virtual reality device, augmented realitydevice, and so forth. Specifically, touch screen controller 212 (ordisplay screen controller) and touch screen 218 (or any of thealternatives mentioned above) may facilitate visual output from server135. Audio controller 214 may be coupled to a microphone 220 and aspeaker 222 to facilitate voice-enabled functions, such as voicerecognition, voice replication, digital recording, and telephonyfunctions. Specifically, audio controller 214 and speaker 222 mayfacilitate audio output from server 135. The other input controller(s)216 may be coupled to other input/control devices 224, such as one ormore buttons, keyboards, rocker switches, thumb-wheel, infrared port,USB port, image sensors, motion sensors, depth sensors, and/or a pointerdevice such as a computer mouse or a stylus.

In some embodiments, processing device 202 may use memory interface 204to access data and a software product stored on a memory device 226.Memory device 226 may include operating system programs for server 135that perform operating system functions when executed by the processingdevice. By way of example, the operating system programs may includeMicrosoft Windows™, Unix™ Linux™, Apple™ operating systems, personaldigital assistant (PDA) type operating systems such as Apple i0S, GoogleAndroid, Blackberry OS, or other types of operating systems.

Memory device 226 may also store communication instructions 228 tofacilitate communicating with one or more additional devices (e.g.,capturing device 125), one or more computers (e.g., output devices145A-145D) and/or one or more servers. Memory device 226 may includegraphical user interface instructions 230 to facilitate graphic userinterface processing; image processing instructions 232 to facilitateimage data processing-related processes and functions; sensor processinginstructions 234 to facilitate sensor-related processing and functions;web browsing instructions 236 to facilitate web browsing-relatedprocesses and functions; and other software instructions 238 tofacilitate other processes and functions. Each of the above identifiedinstructions and applications may correspond to a set of instructionsfor performing one or more functions described above. These instructionsneed not be implemented as separate software programs, procedures, ormodules. Memory device 226 may include additional instructions or fewerinstructions. Furthermore, various functions of server 135 may beimplemented in hardware and/or in software, including in one or moresignal processing and/or application specific integrated circuits. Forexample, server 135 may execute an image processing algorithm toidentify in received images one or more products and/or obstacles, suchas shopping carts, people, and more.

In one embodiment, memory device 226 may store database 140. Database140 may include product type model data 240 (e.g., an imagerepresentation, a list of features, a model obtained by training machinelearning algorithm using training examples, an artificial neuralnetwork, and more) that may be used to identify products in receivedimages; contract-related data 242 (e.g., planograms, promotions data,etc.) that may be used to determine if the placement of products on thestore shelves and/or the promotion execution are consistent withobligations of retail store 105; catalog data 244 (e.g., retail storechain's catalog, retail store's master file, etc.) that may be used tocheck if all product types that should be offered in retail store 105are in fact in the store, if the correct price is displayed next to anidentified product, etc.; inventory data 246 that may be used todetermine if additional products should be ordered from suppliers 115;employee data 248 (e.g., attendance data, records of training provided,evaluation and other performance-related communications, productivityinformation, etc.) that may be used to assign specific employees tocertain tasks; and calendar data 250 (e.g., holidays, national days,international events, etc.) that may be used to determine if a possiblechange in a product model is associated with a certain event. In otherembodiments of the disclosure, database 140 may store additional typesof data or fewer types of data. Furthermore, various types of data maybe stored in one or more memory devices other than memory device 226.

The components and arrangements shown in FIG. 2 are not intended tolimit the disclosed embodiments. As will be appreciated by a personskilled in the art having the benefit of this disclosure, numerousvariations and/or modifications may be made to the depictedconfiguration of server 135. For example, not all components may beessential for the operation of server 135 in all cases. Any componentmay be located in any appropriate part of server 135, and the componentsmay be rearranged into a variety of configurations while providing thefunctionality of the disclosed embodiments. For example, some serversmay not include some of the elements shown in I/O system 215.

FIG. 3 is a block diagram representation of an example configuration ofcapturing device 125. In one embodiment, capturing device 125 mayinclude a processing device 302, a memory interface 304, a networkinterface 306, and a peripherals interface 308 connected to image sensor310. These components can be separated or can be integrated in one ormore integrated circuits. The various components in capturing device 125can be coupled by one or more communication buses or signal lines (e.g.,bus 300). Different aspects of the functionalities of the variouscomponents in capturing device 125 may be understood from thedescription above regarding components of server 135 having similarfunctionality.

According to one embodiment, network interface 306 may be used tofacilitate communication with server 135. Network interface 306 may bean Ethernet port connected to radio frequency receivers and transmittersand/or optical receivers and transmitters. The specific design andimplementation of network interface 306 depends on the communicationsnetwork(s) over which capturing device 125 is intended to operate. Forexample, in some embodiments, capturing device 125 may include a networkinterface 306 designed to operate over a GSM network, a GPRS network, anEDGE network, a Wi-Fi or WiMax network, a Bluetooth® network, etc. Inanother example, the implementation of network interface 306 may besimilar or identical to the implementation described above for networkinterface 206.

In the example illustrated in FIG. 3, peripherals interface 308 ofcapturing device 125 may be connected to at least one image sensor 310associated with at least one lens 312 for capturing image data in anassociated field of view. In some configurations, capturing device 125may include a plurality of image sensors associated with a plurality oflenses 312. In other configurations, image sensor 310 may be part of acamera included in capturing device 125. According to some embodiments,peripherals interface 308 may also be connected to other sensors (notshown), such as a motion sensor, a light sensor, infrared sensor, soundsensor, a proximity sensor, a temperature sensor, a biometric sensor, orother sensing devices to facilitate related functionalities. Inaddition, a positioning sensor may also be integrated with, or connectedto, capturing device 125. For example, such positioning sensor may beimplemented using one of the following technologies: Global PositioningSystem (GPS), GLObal NAvigation Satellite System (GLONASS), Galileoglobal navigation system, BeiDou navigation system, other GlobalNavigation Satellite Systems (GNSS), Indian Regional NavigationSatellite System (IRNSS), Local Positioning Systems (LPS), Real-TimeLocation Systems (RTLS), Indoor Positioning System (IPS), Wi-Fi basedpositioning systems, cellular triangulation, and so forth. For example,the positioning sensor may be built into mobile capturing device 125,such as smartphone devices. In another example, position software mayallow mobile capturing devices to use internal or external positioningsensors (e.g., connecting via a serial port or Bluetooth).

Consistent with the present disclosure, capturing device 125 may includedigital components that collect data from image sensor 310, transform itinto an image, and store the image on a memory device 314 and/ortransmit the image using network interface 306. In one embodiment,capturing device 125 may be fixedly mountable to a store shelf or toother objects in the retail store (such as walls, ceilings, floors,refrigerators, checkout stations, displays, dispensers, rods which maybe connected to other objects in the retail store, and so forth). In oneembodiment, capturing device 125 may be split into at least two housingssuch that only image sensor 310 and lens 312 may be visible on the storeshelf, and the rest of the digital components may be located in aseparate housing. An example of this type of capturing device isdescribed below with reference to FIGS. 5-7.

Consistent with the present disclosure, capturing device 125 may usememory interface 304 to access memory device 314. Memory device 314 mayinclude high-speed, random access memory and/or non-volatile memory suchas one or more magnetic disk storage devices, one or more opticalstorage devices, and/or flash memory (e.g., NAND, NOR) to store capturedimage data. Memory device 314 may store operating system instructions316, such as DARWIN, RTXC, LINUX, iOS, UNIX, LINUX, OS X, WINDOWS, or anembedded operating system such as VXWorkS. Operating system 316 caninclude instructions for handling basic system services and forperforming hardware dependent tasks. In some implementations, operatingsystem 316 may include a kernel (e.g., UNIX kernel, LINUX kernel, and soforth). In addition, memory device 314 may store capturing instructions318 to facilitate processes and functions related to image sensor 310;graphical user interface instructions 320 that enables a user associatedwith capturing device 125 to control the capturing device and/or toacquire images of an area-of-interest in a retail establishment; andapplication instructions 322 to facilitate a process for monitoringcompliance of product placement or other processes.

The components and arrangements shown in FIG. 3 are not intended tolimit the disclosed embodiments. As will be appreciated by a personskilled in the art having the benefit of this disclosure, numerousvariations and/or modifications may be made to the depictedconfiguration of capturing device 125. For example, not all componentsare essential for the operation of capturing device 125 in all cases.Any component may be located in any appropriate part of capturing device125, and the components may be rearranged into a variety ofconfigurations while providing the functionality of the disclosedembodiments. For example, some capturing devices may not have lenses,and other capturing devices may include an external memory deviceinstead of memory device 314.

FIGS. 4A-4C illustrate example configurations for capturing image datain retail store 105 according to disclosed embodiments. FIG. 4Aillustrates how an aisle 400 of retail store 105 may be imaged using aplurality of capturing devices 125 fixedly connected to store shelves.FIG. 4B illustrates how aisle 400 of retail store 105 may be imagedusing a handheld communication device. FIG. 4C illustrates how aisle 400of retail store 105 may be imaged by robotic devices equipped withcameras.

With reference to FIG. 4A and consistent with the present disclosure,retail store 105 may include a plurality of capturing devices 125fixedly mounted (for example, to store shelves, walls, ceilings, floors,refrigerators, checkout stations, displays, dispensers, rods which maybe connected to other objects in the retail store, and so forth) andconfigured to collect image data. As depicted, one side of an aisle 400may include a plurality of capturing devices 125 (e.g., 125A, 125B, and125C) fixedly mounted thereon and directed such that they may captureimages of an opposing side of aisle 400. The plurality of capturingdevices 125 may be connected to an associated mobile power source (e.g.,one or more batteries), to an external power supply (e.g., a powergrid), obtain electrical power from a wireless power transmissionsystem, and so forth. As depicted in FIG. 4A, the plurality of capturingdevices 125 may be placed at different heights and at least theirvertical fields of view may be adjustable. Generally, both sides ofaisle 400 may include capturing devices 125 in order to cover both sidesof aisle 400.

Differing numbers of capturing devices 125 may be used to cover shelvingunit 402. In addition, there may be an overlap region in the horizontalfield of views of some of capturing devices 125. For example, thehorizontal fields of view of capturing devices (e.g., adjacent capturingdevices) may at least partially overlap with one another. In anotherexample, one capturing device may have a lower field of view than thefield of view of a second capturing device, and the two capturingdevices may have at least partially overlapping fields of view.According to one embodiment, each capturing device 125 may be equippedwith network interface 306 for communicating with server 135. In oneembodiment, the plurality of capturing devices 125 in retail store 105may be connected to server 135 via a single WLAN. Network interface 306may transmit information associated with a plurality of images capturedby the plurality of capturing devices 125 for analysis purposes. In oneexample, server 135 may determine an existence of an occlusion event(such as, by a person, by store equipment, such as a ladder, cart, etc.)and may provide a notification to resolve the occlusion event. Inanother example, server 135 may determine if a disparity exists betweenat least one contractual obligation and product placement as determinedbased on automatic analysis of the plurality of images. The transmittedinformation may include raw images, cropped images, processed imagedata, data about products identified in the images, and so forth.Network interface 306 may also transmit information identifying thelocation of the plurality capturing devices 125 in retail store 105.

With reference to FIG. 4B and consistent with the present disclosure,server 135 may receive image data captured by users 120. In a firstembodiment, server 135 may receive image data acquired by storeemployees. In one implementation, a handheld device of a store employee(e.g., capturing device 125D) may display a real-time video streamcaptured by the image sensor of the handheld device. The real-time videostream may be augmented with markings identifying to the store employeean area-of-interest that needs manual capturing of images. One of thesituations in which manual image capture may be desirable may occurwhere the area-of-interest is outside the fields of view of a pluralityof cameras fixedly connected to store shelves in aisle 400. In othersituations, manual capturing of images of an area-of-interest may bedesirable when a current set of acquired images is out of date (e.g.,obsolete in at least one respect) or of poor quality (e.g., lackingfocus, obstacles, lesser resolution, lack of light, and so forth).Additional details of this embodiment are described in Applicant'sInternational Patent Application No. PCT/IB2018/001107, which isincorporated herein by reference.

In a second embodiment, server 135 may receive image data acquired bycrowd sourcing. In one exemplary implementation, server 135 may providea request to a detected mobile device for an updated image of thearea-of-interest in aisle 400. The request may include an incentive(e.g., $2 discount) to user 120 for acquiring the image. In response tothe request, user 120 may acquire and transmit an up-to-date image ofthe area-of-interest. After receiving the image from user 120, server135 may transmit the accepted incentive or agreed upon reward to user120. The incentive may comprise a text notification and a redeemablecoupon. In some embodiments, the incentive may include a redeemablecoupon for a product associated with the area-of-interest. Server 135may generate image-related data based on aggregation of data from imagesreceived from crowd sourcing and from images received from a pluralityof cameras fixedly connected to store shelves. Additional details ofthis embodiment are described in Applicant's International PatentApplication No. PCT/IB2017/000919, which is incorporated herein byreference.

With reference to FIG. 4C and consistent with the present disclosure,server 135 may receive image data captured by robotic devices withcameras traversing in aisle 400. The present disclosure is not limitedto the type of robotic devices used to capture images of retail store105. In some embodiments, the robotic devices may include a robot on atrack (e.g., a Cartesian robot configured to move along an edge of ashelf or in parallel to a shelf, such as capturing device 125E), a drone(e.g., capturing device 125F), and/or a robot that may move on the floorof the retail store (e.g., a wheeled robot such as capturing device125G, a legged robot, a snake-like robot, and so forth). The roboticdevices may be controlled by server 135 and may be operated remotely orautonomously. In one example, server 135 may instruct capturing device125E to perform periodic scans at times when no customers or otherobstructions are identified in aisle 400. Specifically, capturing device125E may be configured to move along store shelf 404 and to captureimages of products placed on store shelf 404, products placed on storeshelf 406, or products located on shelves opposite store shelf (e.g.,store shelf 408). In another example, server 135 may instruct capturingdevice 125F to perform a scan of all the area of retail store 105 beforethe opening hour. In another example, server 135 may instruct capturingdevice 125G to capture a specific area-of-interest, similar as describedabove with reference to receiving images acquired by the storeemployees. In some embodiments, robotic capturing devices (such as 125Fand 125G) may include an internal processing unit that may allow them tonavigate autonomously within retail store 105. For example, the roboticcapturing devices may use input from sensors (e.g., image sensors, depthsensors, proximity sensors, etc.), to avoid collision with objects orpeople, and to complete the scan of the desired area of retail store105.

As discussed above with reference to FIG. 4A, the image datarepresentative of products displayed on store shelves may be acquired bya plurality of stationary capturing devices 125 fixedly mounted in theretail store. One advantage of having stationary image capturing devicesspread throughout retail store 105 is the potential for acquiringproduct images from set locations and on an ongoing basis such thatup-to-date product status may be determined for products throughout aretail store at any desired periodicity (e.g., in contrast to a movingcamera system that may acquire product images more infrequently).However, there may be certain challenges in this approach. The distancesand angles of the image capturing devices relative to the capturedproducts should be selected such as to enable adequate productidentification, especially when considered in view of image sensorresolution and/or optics specifications. For example, a capturing deviceplaced on the ceiling of retail store 105 may have sufficientresolutions and optics to enable identification of large products (e.g.,a pack of toilet paper), but may be insufficient for identifying smallerproducts (e.g., deodorant packages). The image capturing devices shouldnot occupy shelf space that is reserved for products for sale. The imagecapturing devices should not be positioned in places where there is alikelihood that their fields of view will be regularly blocked bydifferent objects. The image capturing devices should be able tofunction for long periods of time with minimum maintenance. For example,a requirement for frequent replacement of batteries may render certainimage acquisition systems cumbersome to use, especially where many imageacquisition devices are in use throughout multiple locations in a retailstore and across multiple retail stores. The image capturing devicesshould also include processing capabilities and transmissioncapabilities for providing real time or near real time image data aboutproducts. The disclosed image acquisition systems address thesechallenges.

FIG. 5A illustrates an example of a system 500 for acquiring images ofproducts in retail store 105. Throughout the disclosure, capturingdevice 125 may refer to a system, such as system 500 shown in FIG. 5A.As shown, system 500 may include a first housing 502 configured forlocation on a retail shelving unit (e.g., as illustrated in FIG. 5B),and a second housing 504 configured for location on the retail shelvingunit separate from first housing 502. The first and the second housingmay be configured for mounting on the retail shelving unit in anysuitable way (e.g., screws, bolts, clamps, adhesives, magnets,mechanical means, chemical means, and so forth). In some embodiments,first housing 502 may include an image capture device 506 (e.g., acamera module that may include image sensor 310) and second housing 504may include at least one processor (e.g., processing device 302)configured to control image capture device 506 and also to control anetwork interface (e.g., network interface 306) for communicating with aremote server (e.g., server 135).

System 500 may also include a data conduit 508 extending between firsthousing 502 and second housing 504. Data conduit 508 may be configuredto enable transfer of control signals from the at least one processor toimage capture device 506 and to enable collection of image data acquiredby image capture device 506 for transmission by the network interface.Consistent with the present disclosure, the term “data conduit” mayrefer to a communications channel that may include either a physicaltransmission medium such as a wire or a logical connection over amultiplexed medium such as a radio channel. In some embodiments, dataconduit 508 may be used for conveying image data from image capturedevice 506 to at least one processor located in second housing 504.Consistent with one implementation of system 500, data conduit 508 mayinclude flexible printed circuits and may have a length of at leastabout 5 cm, at least about 10 cm, at least about 15 cm, etc. The lengthof data conduit 508 may be adjustable to enable placement of firsthousing 502 separately from second housing 504. For example, in someembodiments, data conduit may be retractable within second housing 504such that the length of data conduit exposed between first housing 502and second housing 504 may be selectively adjusted.

In one embodiment, the length of data conduit 508 may enable firsthousing 502 to be mounted on a first side of a horizontal store shelffacing the aisle (e.g., store shelf 510 illustrated in FIG. 5B) andsecond housing 504 to be mounted on a second side of store shelf 510that faces the direction of the ground (e.g., an underside of a storeshelf). In this embodiment, data conduit 508 may be configured to bendaround an edge of store shelf 510 or otherwise adhere/follow contours ofthe shelving unit. For example, a first portion of data conduit 508 maybe configured for location on the first side of store shelf 510 (e.g., aside facing an opposing retail shelving unit across an aisle) and asecond portion of data conduit 508 may be configured for location on asecond side of store shelf 510 (e.g., an underside of the shelf, whichin some cases may be orthogonal to the first side). The second portionof data conduit 508 may be longer than the first portion of data conduit508. Consistent with another embodiment, data conduit 508 may beconfigured for location within an envelope of a store shelf. Forexample, the envelope may include the outer boundaries of a channellocated within a store shelf, a region on an underside of an L-shapedstore shelf, a region between two store shelves, etc. Consistent withanother implementation of system 500 discussed below, data conduit 508may include a virtual conduit associated with a wireless communicationslink between first housing 502 and second housing 504.

FIG. 5B illustrates an exemplary configuration for mounting firsthousing 502 on store shelf 510. Consistent with the present disclosure,first housing 502 may be placed on store shelf 510, next to or embeddedin a plastic cover that may be used for displaying prices.Alternatively, first housing 502 may be placed or mounted on any otherlocation in retail store 105. For example, first housing 502 may beplaced or mounted on the walls, on the ceiling, on refrigerator units,on display units, and more. The location and/or orientation of firsthousing 502 may be selected such that a field of view of image capturedevice 506 may cover at least a portion of an opposing retail shelvingunit. Consistent with the present disclosure, image capture device 506may have a view angle of between 50 and 80 degrees, about 62 degrees,about 67 degrees, or about 75 degrees. Consistent with the presentdisclosure, image capture device 506 may include an image sensor havingsufficient image resolution to enable detection of text associated withlabels on an opposing retail shelving unit. In one embodiment, the imagesensor may include m*n pixels. For example, image capture device 506 mayhave an 8MP image sensor that includes an array of 3280*2464 pixels.Each pixel may include at least one photovoltaic cell that converts thephotons of the incident light to an electric signal. The electricalsignal may be converted to digital data by an A/D converter andprocessed by the image processor (ISP). In one embodiment, the imagesensor of image capture device 506 may be associated with a pixel sizeof between 1.1×1.1 um2 and 1.7×1.7 um2, for example, 1.4×1.4 um2.

Consistent with the present disclosure, image capture device 506 may beassociated with a lens (e.g., lens 312) having a fixed focal lengthselected according to a distance expected to be encountered betweenretail shelving units on opposite sides of an aisle (e.g., distance d1shown in FIG. 6A) and/or according to a distance expected to beencountered between a side of a shelving unit facing the aisle on oneside of an aisle and a side of a shelving unit facing away of the aisleon the other side of the aisle (e.g., distance d2 shown in FIG. 6A). Thefocal length may also be based on any other expected distance betweenthe image acquisition device and products to be imaged As used herein,the term “focal length” refers to the distance from the optical centerof the lens to a point where objects located at the point aresubstantially brought into focus. In contrast to zoom lenses, in fixedlenses the focus is not adjustable. The focus is typically set at thetime of lens design and remains fixed. In one embodiment, the focallength of lens 312 may be selected based on the distance between twosides of aisles in the retail store (e.g., distance d1, distance d2, andso forth). In some embodiments, image capture device 506 may include alens with a fixed focal length having a fixed value between 2.5 mm and4.5 mm, such as about 3.1 mm, about 3.4 mm, about 3.7 mm. For example,when distance d1 between two opposing retail shelving units is about 2meters, the focal length of the lens may be about 3.6 mm. Unlessindicated otherwise, the term “about” with regards to a numeric value isdefined as a variance of up to 5% with respect to the stated value. Ofcourse, image capture devices having non-fixed focal lengths may also beused depending on the requirements of certain imaging environments, thepower and space resources available, etc.

FIG. 5C illustrates an exploded view of second housing 504. In someembodiments, the network interface located in second housing 504 (e.g.,network interface 306) may be configured to transmit to server 135information associated with a plurality of images captured by imagecapture device 506. For example, the transmitted information may be usedto determine if a disparity exists between at least one contractualobligation (e.g. planogram) and product placement. In one example, thenetwork interface may support transmission speeds of 0.5 Mb/s, 1 Mb/s, 5Mb/s, or more. Consistent with the present disclosure, the networkinterface may allow different modes of operations to be selected, suchas: high-speed, slope-control, or standby. In high-speed mode,associated output drivers may have fast output rise and fall times tosupport high-speed bus rates; in slope-control, the electromagneticinterference may be reduced and the slope (i.e., the change of voltageper unit of time) may be proportional to the current output; and instandby mode, the transmitter may be switched off and the receiver mayoperate at a lower current.

Consistent with the present disclosure, second housing 504 may include apower port 512 for conveying energy from a power source to first housing502. In one embodiment, second housing 504 may include a section for atleast one mobile power source 514 (e.g., in the depicted configurationthe section is configured to house four batteries). The at least onemobile power source may provide sufficient power to enable image capturedevice 506 to acquire more than 1,000 pictures, more than 5,000pictures, more than 10,000 pictures, or more than 15,000 pictures, andto transmit them to server 135. In one embodiment, mobile power source514 located in a single second housing 504 may power two or more imagecapture devices 506 mounted on the store shelf. For example, as depictedin FIGS. 6A and 6B, a single second housing 504 may be connected to aplurality of first housings 502 with a plurality of image capturedevices 506 covering different (overlapping or non-overlapping) fieldsof view. Accordingly, the two or more image capture devices 506 may bepowered by a single mobile power source 514 and/or the data captured bytwo or more image capture devices 506 may be processed to generate apanoramic image by a single processing device located in second housing504. In addition to mobile power source 514 or as an alternative tomobile power source 514, second housing 504 may also be connected to anexternal power source. For example, second housing 504 may be mounted toa store shelf and connected to an electric power grid. In this example,power port 512 may be connected to the store shelf through a wire forproviding electrical power to image capture device 506. In anotherexample, a retail shelving unit or retail store 105 may include awireless power transmission system, and power port 512 may be connectedto a device configured to obtain electrical power from the wirelesspower transmission system. In addition, as discussed below, system 500may use power management policies to reduce the power consumption. Forexample, system 500 may use selective image capturing and/or selectivetransmission of images to reduce the power consumption or conservepower.

FIG. 6A illustrates a schematic diagram of a top view of aisle 600 inretail store 105 with multiple image acquisition systems 500 (e.g.,500A, 500B, 500C, 500D, and 500E) deployed thereon for acquiring imagesof products. Aisle 600 may include a first retail shelving unit 602 anda second retail shelving unit 604 that opposes first retail shelvingunit 602. In some embodiments, different numbers of systems 500 may bemounted on opposing retail shelving units. For example, system 500A(including first housing 502A, second housing 504A, and data conduit508A), system 500B (including first housing 502B second housing 504B,and data conduit 508B), and system 500C (including first housing 502C,second housing 504C, and data conduit 508C) may be mounted on firstretail shelving unit 602; and system 500D (including first housing502D1, first housing 502D2, second housing 504D, and data conduits 508D1and 508D2) and system 500E (including first housing 502E1, first housing502E2, second housing 504E, and data conduits 508E1 and 508E2) may bemounted on second retail shelving unit 604. Consistent with the presentdisclosure, image capture device 506 may be configured relative to firsthousing 502 such that an optical axis of image capture device 506 isdirected toward an opposing retail shelving unit when first housing 502is fixedly mounted on a retail shelving unit. For example, optical axis606 of the image capture device associated with first housing 502B maybe directed towards second retail shelving unit 604 when first housing502B is fixedly mounted on first retail shelving unit 602. A singleretail shelving unit may hold a number of systems 500 that include aplurality of image capturing devices. Each of the image capturingdevices may be associated with a different field of view directed towardthe opposing retail shelving unit. Different vantage points ofdifferently located image capture devices may enable image acquisitionrelative to different sections of a retail shelf. For example, at leastsome of the plurality of image capturing devices may be fixedly mountedon shelves at different heights. Examples of such a deployment areillustrated in FIGS. 4A and 6B.

As shown in FIG. 6A each first housing 502 may be associated with a dataconduit 508 that enables exchanging of information (e.g., image data,control signals, etc.) between the at least one processor located insecond housing 504 and image capture device 506 located in first housing502. In some embodiments, data conduit 508 may include a wiredconnection that supports data-transfer and may be used to power imagecapture device 506 (e.g., data conduit 508A, data conduit 508B, dataconduit 508D1, data conduit 508D2, data conduit 508E1, and data conduit508E2). Consistent with these embodiments, data conduit 508 may complywith a wired standard such as USB, Micro-USB, HDMI, Micro-HDMI,Firewire, Apple, etc. In other embodiments, data conduit 508 may be awireless connection, such as a dedicated communications channel betweenthe at least one processor located in second housing 504 and imagecapture device 506 located in first housing 502 (e.g., data conduit508C). In one example, the communications channel may be established bytwo Near Field Communication (NFC) transceivers. In other examples,first housing 502 and second housing 504 may include interface circuitsthat comply with other short-range wireless standards such as Bluetooth,WiFi, ZigBee, etc.

In some embodiments of the disclosure, the at least one processor ofsystem 500 may cause at least one image capture device 506 toperiodically capture images of products located on an opposing retailshelving unit (e.g., images of products located on a shelf across anaisle from the shelf on which first housing 502 is mounted). The term“periodically capturing images” includes capturing an image or images atpredetermined time intervals (e.g., every minute, every 30 minutes,every 150 minutes, every 300 minutes, etc.), capturing video, capturingan image every time a status request is received, and/or capturing animage subsequent to receiving input from an additional sensor, forexample, an associated proximity sensor. Images may also be capturedbased on various other triggers or in response to various other detectedevents. In some embodiments, system 500 may receive an output signalfrom at least one sensor located on an opposing retail shelving unit.For example, system 500B may receive output signals from a sensingsystem located on second retail shelving unit 604. The output signalsmay be indicative of a sensed lifting of a product from second retailshelving unit 604 or a sensed positioning of a product on second retailshelving unit 604. In response to receiving the output signal from theat least one sensor located on second retail shelving unit 604, system500B may cause image capture device 506 to capture one or more images ofsecond retail shelving unit 604. Additional details on a sensing system,including the at least one sensor that generates output signalsindicative of a sensed lifting of a product from an opposing retailshelving unit, is discussed below with reference to FIGS. 8-10.

Consistent with embodiments of the disclosure, system 500 may detect anobject 608 in a selected area between first retail shelving unit 602 andsecond retail shelving unit 604. Such detection may be based on theoutput of one or more dedicated sensors (e.g., motion detectors, etc.)and/or may be based on image analysis of one or more images acquired byan image acquisition device. Such images, for example, may include arepresentation of a person or other object recognizable through variousimage analysis techniques (e.g., trained neural networks, Fouriertransform analysis, edge detection, filters, face recognition, and soforth). The selected area may be associated with distance d1 betweenfirst retail shelving unit 602 and second retail shelving unit 604. Theselected area may be within the field of view of image capture device506 or an area where the object causes an occlusion of a region ofinterest (such as a shelf, a portion of a shelf being monitored, andmore). Upon detecting object 608, system 500 may cause image capturedevice 506 to forgo image acquisition while object 608 is within theselected area. In one example, object 608 may be an individual, such asa customer or a store employee. In another example, detected object 608may be an inanimate object, such as a cart, box, carton, one or moreproducts, cleaning robots, etc. In the example illustrated in FIG. 6A,system 500A may detect that object 608 has entered into its associatedfield of view (e.g., using a proximity sensor) and may instruct imagecapturing device 506 to forgo image acquisition. In alternativeembodiments, system 500 may analyze a plurality of images acquired byimage capture device 506 and identify at least one image of theplurality of images that includes a representation of object 608.Thereafter, system 500 may avoid transmission of at least part of the atleast one identified image and/or information based on the at least oneidentified image to server 135.

As shown in FIG. 6A, the at least one processor contained in a secondhousing 504 may control a plurality of image capture devices 506contained in a plurality of first housings 502 (e.g., systems 500D and500E). Controlling image capturing device 506 may include instructingimage capturing device 506 to capture an image and/or transmit capturedimages to a remote server (e.g., server 135). In some cases, each of theplurality of image capture devices 506 may have a field of view that atleast partially overlaps with a field of view of at least one otherimage capture device 506 from among plurality of image capture devices506. In one embodiment, the plurality of image capture devices 506 maybe configured for location on one or more horizontal shelves and may bedirected to substantially different areas of the opposing first retailshelving unit. In this embodiment, the at least one processor maycontrol the plurality of image capture devices such that each of theplurality of image capture devices may capture an image at a differenttime. For example, system 500E may have a second housing 504E with atleast one processor that may instruct a first image capturing devicecontained in first housing 502E1 to capture an image at a first time andmay instruct a second image capturing device contained in first housing502E2 to capture an image at a second time which differs from the firsttime. Capturing images in different times (or forwarding them to the atleast one processor at different times) may assist in processing theimages and writing the images in the memory associated with the at leastone processor.

FIG. 6B illustrates a perspective view assembly diagram depicting aportion of a retail shelving unit 620 with multiple systems 500 (e.g.,500F, 500G, 500H, 500I, and 500J) deployed thereon for acquiring imagesof products. Retail shelving unit 620 may include horizontal shelves atdifferent heights. For example, horizontal shelves 622A, 622B, and 622Care located below horizontal shelves 622D, 622E, and 622F. In someembodiments, a different number of systems 500 may be mounted on shelvesat different heights. For example, system 500F (including first housing502F and second housing 504F), system 500G (including first housing 502Gand second housing 504G), and system 500H (including first housing 502Hand second housing 504H) may be mounted on horizontal shelves associatedwith a first height; and system 500I (including first housing 5021,second housing 5041, and a projector 632) and system 500J (includingfirst housing 502J1, first housing 502J2, and second housing 504J) maybe mounted on horizontal shelves associated with a second height. Insome embodiments, retail shelving unit 620 may include a horizontalshelf with at least one designated place (not shown) for mounting ahousing of image capturing device 506. The at least one designated placemay be associated with connectors such that first housing 502 may befixedly mounted on a side of horizontal shelf 622 facing an opposingretail shelving unit using the connectors.

Consistent with the present disclosure, system 500 may be mounted on aretail shelving unit that includes at least two adjacent horizontalshelves (e.g., shelves 622A and 622B) forming a substantially continuoussurface for product placement. The store shelves may include standardstore shelves or customized store shelves. A length of each store shelf622 may be at least 50 cm, less than 200 cm, or between 75 cm to 175 cm.In one embodiment, first housing 502 may be fixedly mounted on theretail shelving unit in a slit between two adjacent horizontal shelves.For example, first housing 502G may be fixedly mounted on retailshelving unit 620 in a slit between horizontal shelf 622B and horizontalshelf 622C. In another embodiment, first housing 502 may be fixedlymounted on a first shelf and second housing 504 may be fixedly mountedon a second shelf. For example, first housing 5021 may be mounted onhorizontal shelf 622D and second housing 5041 may be mounted onhorizontal shelf 622E. In another embodiment, first housing 502 may befixedly mounted on a retail shelving unit on a first side of ahorizontal shelf facing the opposing retail shelving unit and secondhousing 504 may be fixedly mounted on retail shelving unit 620 on asecond side of the horizontal shelf orthogonal to the first side. Forexample, first housing 502H may mounted on a first side 624 ofhorizontal shelf 622C next to a label and second housing 504H may bemounted on a second side 626 of horizontal shelf 622C that faces down(e.g., towards the ground or towards a lower shelf). In anotherembodiment, second housing 504 may be mounted closer to the back of thehorizontal shelf than to the front of the horizontal shelf. For example,second housing 504H may be fixedly mounted on horizontal shelf 622C onsecond side 626 closer to third side 628 of the horizontal shelf 622Cthan to first side 624. Third side 628 may be parallel to first side624. As mentioned above, data conduit 508 (e.g., data conduit 508H) mayhave an adjustable or selectable length for extending between firsthousing 502 and second housing 504. In one embodiment, when firsthousing 502H is fixedly mounted on first side 624, the length of dataconduit 508H may enable second housing 604H to be fixedly mounted onsecond side 626 closer to third side 628 than to first side 624.

As mentioned above, at least one processor contained in a single secondhousing 504 may control a plurality of image capture devices 506contained in a plurality of first housings 502 (e.g., system 500J). Insome embodiments, the plurality of image capture devices 506 may beconfigured for location on a single horizontal shelf and may be directedto substantially the same area of the opposing first retail shelvingunit (e.g., system 500D in FIG. 6A). In these embodiments, the imagedata acquired by the first image capture device and the second imagecapture device may enable a calculation of depth information (e.g.,based on image parallax information) associated with at least oneproduct positioned on an opposing retail shelving unit. For example,system 500J may have single second housing 504J with at least oneprocessor that may control a first image capturing device contained infirst housing 502J1 and a second image capturing device contained infirst housing 502J2. The distance d3 between the first image capturedevice contained in first housing 502J1 and the second image capturedevice contained in first housing 502J2 may be selected based on thedistance between retail shelving unit 620 and the opposing retailshelving unit (e.g., similar to d1 and/or d2). For example, distance d3may be at least 5 cm, at least 10 cm, at least 15 cm, less than 40 cm,less than 30 cm, between about 5 cm to about 20 cm, or between about 10cm to about 15 cm. In another example, d3 may be a function of d1 and/ord2, a linear function of d1 and/or d2, a function of d1*log(d1) and/ord2*log(d2) such as a1* d1*log(d1) for some constant a1, and so forth.The data from the first image capturing device contained in firsthousing 502J1 and the second image capturing device contained in firsthousing 502J2 may be used to estimate the number of products on a storeshelf of retail shelving unit 602. In related embodiments, system 500may control a projector (e.g., projector 632) and image capture device506 that are configured for location on a single store shelf or on twoseparate store shelves. For example, projector 632 may be mounted onhorizontal shelf 622E and image capture device 5061 may be mounted onhorizontal shelf 622D. The image data acquired by image capture device506 (e.g., included in first housing 5021) may include reflections oflight patterns projected from projector 632 on the at least one productand/or the opposing retail shelving unit and may enable a calculation ofdepth information associated with at least one product positioned on theopposing retail shelving unit. The distance between projector 632 andthe image capture device contained in first housing 5021 may be selectedbased on the distance between retail shelving unit 620 and the opposingretail shelving unit (e.g., similar to d1 and/or d2). For example, thedistance between the projector and the image capture device may be atleast 5 cm, at least 10 cm, at least 15 cm, less than 40 cm, less than30 cm, between about 5 cm to about 20 cm, or between about 10 cm toabout 15 cm. In another example, the distance between the projector andthe image capture device may be a function of d1 and/or d2, a linearfunction of d1 and/or d2, a function of d1*log(d1) and/or d2*log(d2)such as a1* d1*log(d1) for some constant a1, and so forth.

Consistent with the present disclosure, a central communication device630 may be located in retail store 105 and may be configured tocommunicate with server 135 (e.g., via an Internet connection). Thecentral communication device may also communicate with a plurality ofsystems 500 (for example, less than ten, ten, eleven, twelve, more thantwelve, and so forth). In some cases, at least one system of theplurality of systems 500 may be located in proximity to centralcommunication device 630. In the illustrated example, system 500F may belocated in proximity to central communication device 630. In someembodiments, at least some of systems 500 may communicate directly withat least one other system 500. The communications between some of theplurality of systems 500 may happen via a wired connection, such as thecommunications between system 500J and system 500I and thecommunications between system 500H and system 500G. Additionally oralternatively, the communications between some of the plurality ofsystems 500 may occur via a wireless connection, such as thecommunications between system 500G and system 500F and thecommunications between system 500I and system 500F. In some examples, atleast one system 500 may be configured to transmit captured image data(or information derived from the captured image data) to centralcommunication device 630 via at least two mediating systems 500, atleast three mediating systems 500, at least four mediating systems 500,or more. For example, system 500J may convey captured image data tocentral communication device 630 via system 500I and system 500F.

Consistent with the present disclosure, two (or more) systems 500 mayshare information to improve image acquisition. For example, system 500Jmay be configured to receive from a neighboring system 500I informationassociated with an event that system 500I had identified, and controlimage capture device 506 based on the received information. For example,system 500J may forgo image acquisition based on an indication fromsystem 500I that an object has entered or is about to enter its field ofview. Systems 500I and 500J may have overlapping fields of view ornon-overlapping fields of view. In addition, system 500J may alsoreceive (from system 500I) information that originates from centralcommunication device 630 and control image capture device 506 based onthe received information. For example, system 500I may receiveinstructions from central communication device 630 to capture an imagewhen suppler 115 inquiries about a specific product that is placed in aretail unit opposing system 500I. In some embodiments, a plurality ofsystems 500 may communicate with central communication device 630. Inorder to reduce or avoid network congestion, each system 500 mayidentify an available transmission time slot. Thereafter, each system500 may determine a default time slot for future transmissions based onthe identified transmission time slot.

FIG. 6C provides a diagrammatic representation of a retail shelving unit640 being captured by multiple systems 500 (e.g., system 500K and system500L) deployed on an opposing retail shelving unit (not shown). FIG. 6Cillustrates embodiments associated with the process of installingsystems 500 in retail store 105. To facilitate the installation ofsystem 500, each first housing 502 (e.g., first housing 502K) mayinclude an adjustment mechanism 642 for setting a field of view 644 ofimage capture device 506K such that the field of view 644 will at leastpartially encompass products placed both on a bottom shelf of retailshelving unit 640 and on a top shelf of retail shelving unit 640. Forexample, adjustment mechanism 642 may enable setting the position ofimage capture device 506K relative to first housing 502K. Adjustmentmechanism 642 may have at least two degrees of freedom to separatelyadjust manually (or automatically) the vertical field of view and thehorizontal field of view of image capture device 506K. In oneembodiment, the angle of image capture device 506K may be measured usingposition sensors associated with adjustment mechanism 642, and themeasured orientation may be used to determine if image capture device506K is positioned in the right direction. In one example, the output ofthe position sensors may be displayed on a handheld device of anemployee installing image capturing device 506K. Such an arrangement mayprovide the employee/installer with real time visual feedbackrepresentative of the field of view of an image acquisition device beinginstalled.

In addition to adjustment mechanism 642, first housing 502 may include afirst physical adapter (not shown) configured to operate with multipletypes of image capture device 506 and a second physical adapter (notshown) configured to operate with multiple types of lenses. Duringinstallation, the first physical adapter may be used to connect asuitable image capture device 506 to system 500 according to the levelof recognition requested (e.g., detecting a barcode from products,detecting text and price from labels, detecting different categories ofproducts, and so forth). Similarly, during installation, the secondphysical adapter may be used to associate a suitable lens to imagecapture device 506 according to the physical conditions at the store(e.g., the distance between the aisles, the horizontal field of viewrequired from image capture device 506, and/or the vertical field ofview required from image capture device 506). The second physicaladapter provides the employee/installer the ability to select the focallength of lens 312 during installation according to the distance betweenretail shelving units on opposite sides of an aisle (e.g., distance d1and/or distance d2 shown in FIG. 6A). In some embodiments, adjustmentmechanism 642 may include a locking mechanism to reduce the likelihoodof unintentional changes in the field of view of image capture device506. Additionally or alternatively, the at least one processor containedin second housing 504 may detect changes in the field of view of imagecapture device 506 and issue a warning when a change is detected, when achange larger than a selected threshold is detected, when a change isdetected for a duration longer than a selected threshold, and so forth.

In addition to adjustment mechanism 642 and the different physicaladapters, system 500 may modify the image data acquired by image capturedevice 506 based on at least one attribute associated with opposingretail shelving unit 640. Consistent with the present disclosure, the atleast one attribute associated with retail shelving unit 640 may includea lighting condition, the dimensions of opposing retail shelving unit640, the size of products displayed on opposing retail shelving unit640, the type of labels used on opposing retail shelving unit 640, andmore. In some embodiments, the attribute may be determined, based onanalysis of one or more acquired images, by at least one processorcontained in second housing 504. Alternatively, the attribute may beautomatically sensed and conveyed to the at least one processorcontained in second housing 504. In one example, the at least oneprocessor may change the brightness of captured images based on thedetected light conditions. In another example, the at least oneprocessor may modify the image data by cropping the image such that itwill include only the products on retail shelving unit (e.g., not toinclude the floor or the ceiling), only area of the shelving unitrelevant to a selected task (such as planogram compliance check), and soforth.

Consistent with the present disclosure, during installation, system 500may enable real-time display 646 of field of view 644 on a handhelddevice 648 of a user 650 installing image capturing device 506K. In oneembodiment, real-time display 646 of field of view 644 may includeaugmented markings 652 indicating a location of a field of view 654 ofan adjacent image capture device 506L. In another embodiment, real-timedisplay 646 of field of view 644 may include augmented markings 656indicating a region of interest in opposing retail shelving unit 640.The region of interest may be determined based on a planogram,identified product type, and/or part of retail shelving unit 640. Forexample, the region of interest may include products with a greaterlikelihood of planogram incompliance. In addition, system 500K mayanalyze acquired images to determine if field of view 644 includes thearea that image capturing device 506K is supposed to monitor (forexample, from labels on opposing retail shelving unit 640, products onopposing retail shelving unit 640, images captured from other imagecapturing devices that may capture other parts of opposing retailshelving unit 640 or capture the same part of opposing retail shelvingunit 640 but in a lower resolution or at a lower frequency, and soforth). In additional embodiments, system 500 may further comprise anindoor location sensor which may help determine if the system 500 ispositioned at the right location in retail store 105.

In some embodiments, an anti-theft device may be located in at least oneof first housing 502 and second housing 504. For example, the anti-theftdevice may include a specific RF label or a pin-tag radio-frequencyidentification device, which may be the same or similar to a type ofanti-theft device that is used by retail store 105 in which system 500is located. The RF label or the pin-tag may be incorporated within thebody of first housing 502 and second housing 504 and may not be visible.In another example, the anti-theft device may include a motion sensorwhose output may be used to trigger an alarm in the case of motion ordisturbance, in case of motion that is above a selected threshold, andso forth.

FIG. 7A includes a flowchart representing an exemplary method 700 foracquiring images of products in retail store 105 in accordance withexample embodiments of the present disclosure. For purposes ofillustration, in the following description, reference is made to certaincomponents of system 500 as deployed in the configuration depicted inFIG. 6A. It will be appreciated, however, that other implementations arepossible and that other configurations may be utilized to implement theexemplary method. It will also be readily appreciated that theillustrated method can be altered to modify the order of steps, deletesteps, or further include additional steps.

At step 702, the method includes fixedly mounting on first retailshelving unit 602 at least one first housing 502 containing at least oneimage capture device 506 such that an optical axis (e.g., optical axis606) of at least one image capture device 506 is directed to secondretail shelving unit 604. In one embodiment, fixedly mounting firsthousing 502 on first retail shelving unit 602 may include placing firsthousing 502 on a side of store shelf 622 facing second retail shelvingunit 604. In another embodiment, fixedly mounting first housing 502 onretail shelving unit 602 may include placing first housing 502 in a slitbetween two adjacent horizontal shelves. In some embodiments, the methodmay further include fixedly mounting on first retail shelving unit 602at least one projector (such as projector 632) such that light patternsprojected by the at least one projector are directed to second retailshelving unit 604. In one embodiment, the method may include mountingthe at least one projector to first retail shelving unit 602 at aselected distance to first housing 502 with image capture device 506. Inone embodiment, the selected distance may be at least 5 cm, at least 10cm, at least 15 cm, less than 40 cm, less than 30 cm, between about 5 cmto about 20 cm, or between about 10 cm to about 15 cm. In oneembodiment, the selected distance may be calculated according to adistance between to first retail shelving unit 602 and second retailshelving unit 604, such as d1 and/or d2, for example selecting thedistance to be a function of d1 and/or d2, a linear function of d1and/or d2, a function of d1*log(d1) and/or d2*log(d2) such as a1*d1*log(d1) for some constant a1, and so forth.

At step 704, the method includes fixedly mounting on first retailshelving unit 602 second housing 504 at a location spaced apart from theat least one first housing 502, second housing 504 may include at leastone processor (e.g., processing device 302). In one embodiment, fixedlymounting second housing 504 on the retail shelving unit may includeplacing second housing 504 on a different side of store shelf 622 thanthe side first housing 502 is mounted on.

At step 706, the method includes extending at least one data conduit 508between at least one first housing 502 and second housing 504. In oneembodiment, extending at least one data conduit 508 between at least onefirst housing 502 and second housing 504 may include adjusting thelength of data conduit 508 to enable first housing 502 to be mountedseparately from second housing 504. At step 708, the method includescapturing images of second retail shelving unit 604 using at least oneimage capture device 506 contained in at least one first housing 502(e.g., first housing 502A, first housing 502B, or first housing 502C).In one embodiment, the method further includes periodically capturingimages of products located on second retail shelving unit 604. Inanother embodiment the method includes capturing images of second retailshelving unit 604 after receiving a trigger from at least one additionalsensor in communication with system 500 (wireless or wired).

At step 710, the method includes transmitting at least some of thecaptured images from second housing 504 to a remote server (e.g., server135) configured to determine planogram compliance relative to secondretail shelving unit 604. In some embodiments, determining planogramcompliance relative to second retail shelving unit 604 may includedetermining at least one characteristic of planogram compliance based ondetected differences between the at least one planogram and the actualplacement of the plurality of product types on second retail shelvingunit 604. Consistent with the present disclosure, the characteristic ofplanogram compliance may include at least one of: product facing,product placement, planogram compatibility, price correlation, promotionexecution, product homogeneity, restocking rate, and planogramcompliance of adjacent products.

FIG. 7B provides a flowchart representing an exemplary method 720 foracquiring images of products in retail store 105, in accordance withexample embodiments of the present disclosure. For purposes ofillustration, in the following description, reference is made to certaincomponents of system 500 as deployed in the configuration depicted inFIG. 6A. It will be appreciated, however, that other implementations arepossible and that other configurations may be utilized to implement theexemplary method. It will also be readily appreciated that theillustrated method can be altered to modify the order of steps, deletesteps, or further include additional steps.

At step 722, at least one processor contained in a second housing mayreceive from at least one image capture device contained in at least onefirst housing fixedly mounted on a retail shelving unit a plurality ofimages of an opposing retail shelving unit. For example, at least oneprocessor contained in second housing 504A may receive from at least oneimage capture device 506 contained in first housing 502A (fixedlymounted on first retail shelving unit 602) a plurality of images ofsecond retail shelving unit 604. The plurality of images may be capturedand collected during a period of time (e.g., a minute, an hour, sixhours, a day, a week, or more).

At step 724, the at least one processor contained in the second housingmay analyze the plurality of images acquired by the at least one imagecapture device. In one embodiment, at least one processor contained insecond housing 504A may use any suitable image analysis technique (forexample, object recognition, object detection, image segmentation,feature extraction, optical character recognition (OCR), object-basedimage analysis, shape region techniques, edge detection techniques,pixel-based detection, artificial neural networks, convolutional neuralnetworks, etc.) to identify objects in the plurality of images. In oneexample, the at least one processor contained in second housing 504A maydetermine the number of products located in second retail shelving unit604. In another example, the at least one processor contained in secondhousing 504A may detect one or more objects in an area between firstretail shelving unit 602 and second retail shelving unit 604.

At step 726, the at least one processor contained in the second housingmay identify in the plurality of images a first image that includes arepresentation of at least a portion of an object located in an areabetween the retail shelving unit and the opposing retail shelving unit.In step 728, the at least one processor contained in the second housingmay identify in the plurality of images a second image that does notinclude any object located in an area between the retail shelving unitand the opposing retail shelving unit. In one example, the object in thefirst image may be an individual, such as a customer or a storeemployee. In another example, the object in the first image may be aninanimate object, such as carts, boxes, products, etc.

At step 730, the at least one processor contained in the second housingmay instruct a network interface contained in the second housing,fixedly mounted on the retail shelving unit separate from the at leastone first housing, to transmit the second image to a remote server andto avoid transmission of the first image to the remote server. Inaddition, the at least one processor may issue a notification when anobject blocks the field of view of the image capturing device for morethan a predefined period of time (e.g., at least 30 minutes, at least 75minutes, at least 150 minutes).

Embodiments of the present disclosure may automatically assesscompliance of one or more store shelves with a planogram. For example,embodiments of the present disclosure may use signals from one or moresensors to determine placement of one or more products on store shelves.The disclosed embodiments may also use one or more sensors to determineempty spaces on the store shelves. The placements and empty spaces maybe automatically assessed against a digitally encoded planogram. Aplanogram refers to any data structure or specification that defines atleast one product characteristic relative to a display structureassociated with a retail environment (such as store shelf or area of oneor more shelves). Such product characteristics may include, among otherthings, quantities of products with respect to areas of the shelves,product configurations or product shapes with respect to areas of theshelves, product arrangements with respect to areas of the shelves,product density with respect to areas of the shelves, productcombinations with respect to areas of the shelves, etc. Althoughdescribed with reference to store shelves, embodiments of the presentdisclosure may also be applied to end caps or other displays; bins,shelves, or other organizers associated with a refrigerator or freezerunits; or any other display structure associated with a retailenvironment.

The embodiments disclosed herein may use any sensors configured todetect one or more parameters associated with products (or a lackthereof). For example, embodiments may use one or more of pressuresensors, weight sensors, light sensors, resistive sensors, capacitivesensors, inductive sensors, vacuum pressure sensors, high pressuresensors, conductive pressure sensors, infrared sensors, photo-resistorsensors, photo-transistor sensors, photo-diodes sensors, ultrasonicsensors, or the like. Some embodiments may use a plurality of differentkinds of sensors, for example, associated with the same or overlappingareas of the shelves and/or associated with different areas of theshelves. Some embodiments may use a plurality of sensors configured tobe placed adjacent a store shelf, configured for location on the storeshelf, configured to be attached to, or configured to be integrated withthe store shelf. In some cases, at least part of the plurality ofsensors may be configured to be placed next to a surface of a storeshelf configured to hold products. For example, the at least part of theplurality of sensors may be configured to be placed relative to a partof a store shelf such that the at least part of the plurality of sensorsmay be positioned between the part of a store shelf and products placedon the part of the shelf. In another embodiment, the at least part ofthe plurality of sensors may be configured to be placed above and/orwithin and/or under the part of the shelf.

In one example, the plurality of sensors may include light detectorsconfigured to be located such that a product placed on the part of theshelf may block at least some of the ambient light from reaching thelight detectors. The data received from the light detectors may beanalyzed to detect a product or to identify a product based on the shapeof a product placed on the part of the shelf. In one example, the systemmay identify the product placed above the light detectors based on datareceived from the light detectors that may be indicative of at leastpart of the ambient light being blocked from reaching the lightdetectors. Further, the data received from the light detectors may beanalyzed to detect vacant spaces on the store shelf. For example, thesystem may detect vacant spaces on the store shelf based on the receiveddata that may be indicative of no product being placed on a part of theshelf. In another example, the plurality of sensors may include pressuresensors configured to be located such that a product placed on the partof the shelf may apply detectable pressure on the pressure sensors.Further, the data received from the pressure sensors may be analyzed todetect a product or to identify a product based on the shape of aproduct placed on the part of the shelf. In one example, the system mayidentify the product placed above the pressure sensors based on datareceived from the pressure sensors being indicative of pressure beingapplied on the pressure sensors. In addition, the data from the pressuresensors may be analyzed to detect vacant spaces on the store shelf, forexample based on the readings being indicative of no product beingplaced on a part of the shelf, for example, when the pressure readingsare below a selected threshold. Consistent with the present disclosure,inputs from different types of sensors (such as pressure sensors, lightdetectors, etc.) may be combined and analyzed together, for example todetect products placed on a store shelf, to identify shapes of productsplaced on a store shelf, to identify types of products placed on a storeshelf, to identify vacant spaces on a store shelf, and so forth.

With reference to FIG. 8A and consistent with the present disclosure, astore shelf 800 may include a plurality of detection elements, e.g.,detection elements 801A and 801B. In the example of FIG. 8A, detectionelements 801A and 801B may comprise pressure sensors and/or other typeof sensors for measuring one or more parameters (such as resistance,capacitance, or the like) based on physical contact (or lack thereof)with products, e.g., product 803A and product 803B. Additionally oralternatively, detection elements configured to measure one or moreparameters (such as current induction, magnetic induction, visual orother electromagnetic reflectance, visual or other electromagneticemittance, or the like) may be included to detect products based onphysical proximity (or lack thereof) to products. Consistent with thepresent disclosure, the plurality of detection elements may beconfigured for location on shelf 800. The plurality of detectionelements may be configured to detect placement of products when theproducts are placed above at least part of the plurality of detectionelements. Some embodiments of the disclosure, however, may be performedwhen at least some of the detection elements may be located next toshelf 800 (e.g., for magnetometers or the like), across from shelf 800(e.g., for image sensors or other light sensors, light detection andranging (LIDAR) sensors, radio detection and ranging (RADAR) sensors, orthe like), above shelf 800 (e.g., for acoustic sensors or the like),below shelf 800 (e.g., for pressure sensors or the like), or any otherappropriate spatial arrangement. Although depicted as standalone unitsin the example of FIG. 8A, the plurality of detection elements may formpart of a fabric (e.g., a smart fabric or the like), and the fabric maybe positioned on a shelf to take measurements. For example, two or moredetection elements may be integrated together into a single structure(e.g., disposed within a common housing, integrated together within afabric or mat, and so forth). In some examples, detection elements (suchas detection elements 801A and 801B) may be placed adjacent to (orplaced on) store shelves as described above. Some examples of detectionelements may include pressure sensors and/or light detectors configuredto be placed above and/or within and/or under a store shelf as describedabove.

Detection elements associated with shelf 800 may be associated withdifferent areas of shelf 800. For example, detection elements 801A and801B are associated with area 805A while other detection elements areassociated with area 805B. Although depicted as rows, areas 805A and805B may comprise any areas of shelf 800, whether contiguous (e.g., asquare, a rectangular, or other regular or irregular shape) or not(e.g., a plurality of rectangles or other regular and/or irregularshapes). Such areas may also include horizontal regions between shelves(as shown in FIG. 8A) or may include vertical regions that include areaof multiple different shelves (e.g., columnar regions spanning overseveral different horizontally arranged shelves). In some examples, theareas may be part of a single plane. In some examples, each area may bepart of a different plane. In some examples, a single area may be partof a single plane or be divided across multiple planes.

One or more processors (e.g., processing device 202) configured tocommunicate with the detection elements (e.g., detection elements 801Aand 801B) may detect first signals associated with a first area (e.g.,areas 805A and/or 805B) and second signals associated with a secondarea. In some embodiments, the first area may, in part, overlap with thesecond area. For example, one or more detection elements may beassociated with the first area as well as the second area and/or one ormore detection elements of a first type may be associated with the firstarea while one or more detection elements of a second type may beassociated with the second area overlapping, at least in part, the firstarea. In other embodiments, the first area and the second area may bespatially separate from each other.

The one or more processors may, using the first and second signals,determine that one or more products have been placed in the first areawhile the second area includes at least one empty area. For example, ifthe detection elements include pressure sensors, the first signals mayinclude weight signals that match profiles of particular products (suchas the mugs or plates depicted in the example of FIG. 8A), and thesecond signals may include weight signals indicative of the absence ofproducts (e.g., by being equal to or within a threshold of a defaultvalue such as atmospheric pressure or the like). The disclosed weightsignals may be representative of actual weight values associated with aparticular product type or, alternatively, may be associated with arelative weight value sufficient to identify the product and/or toidentify the presence of a product. In some cases, the weight signal maybe suitable for verifying the presence of a product regardless ofwhether the signal is also sufficient for product identification. Inanother example, if the detection elements include light detectors (asdescribed above), the first signals may include light signals that matchprofiles of particular products (such as the mugs or plates depicted inthe example of FIG. 8A), and the second signals may include lightsignals indicative of the absence of products (e.g., by being equal toor within a threshold of a default value such as values corresponding toambient light or the like). For example, the first light signals may beindicative of ambient light being blocked by particular products, whilethe second light signals may be indicative of no product blocking theambient light. The disclosed light signals may be representative ofactual light patterns associated with a particular product type or,alternatively, may be associated with light patterns sufficient toidentify the product and/or to identify the presence of a product.

The one or more processors may similarly process signals from othertypes of sensors. For example, if the detection elements includeresistive or inductive sensors, the first signals may includeresistances, voltages, and/or currents that match profiles of particularproducts (such as the mugs or plates depicted in the example of FIG. 8Aor elements associated with the products, such as tags, etc.), and thesecond signals may include resistances, voltages, and/or currentsindicative of the absence of products (e.g., by being equal to or withina threshold of a default value such as atmospheric resistance, a defaultvoltage, a default current, corresponding to ambient light, or thelike). In another example, if the detection elements include acoustics,LIDAR, RADAR, or other reflective sensors, the first signals may includepatterns of returning waves (whether sound, visible light, infraredlight, radio, or the like) that match profiles of particular products(such as the mugs or plates depicted in the example of FIG. 8A), and thesecond signals may include patterns of returning waves (whether sound,visible light, infrared light, radio, or the like) indicative of theabsence of products (e.g., by being equal to or within a threshold of apattern associated with an empty shelf or the like).

Any of the profile matching described above may include direct matchingof a subject to a threshold. For example, direct matching may includetesting one or more measured values against the profile value(s) withina margin of error; mapping a received pattern onto a profile patternwith a residual having a maximum, minimum, integral, or the like withinthe margin of error; performing an autocorrelation, Fourier transform,convolution, or other operation on received measurements or a receivedpattern and comparing the resultant values or function against theprofile within a margin of error; or the like. Additionally oralternatively, profile matching may include fuzzy matching betweenmeasured values and/or patterns and a database of profiles such that aprofile with a highest level of confidence according to the fuzzysearch. Moreover, as depicted in the example of FIG. 8A, products, suchas product 803B, may be stacked and thus associated with a differentprofile when stacked than when standalone.

Any of the profile matching described above may include use of one ormore machine learning techniques. For example, one or more artificialneural networks, random forest models, or other models trained onmeasurements annotated with product identifiers may process themeasurements from the detection elements and identify productstherefrom. In such embodiments, the one or more models may useadditional or alternative input, such as images of the shelf (e.g., fromcapturing devices 125 of FIGS. 4A-4C explained above) or the like.

Based on detected products and/or empty spaces, determined using thefirst signals and second signals, the one or more processors maydetermine one or more aspects of planogram compliance. For example, theone or more processors may identify products and their locations on theshelves, determine quantities of products within particular areas (e.g.,identifying stacked or clustered products), identify facing directionsassociated with the products (e.g., whether a product is outward facing,inward facing, askew, or the like), or the like. Identification of theproducts may include identifying a product type (e.g., a bottle of soda,a loaf of broad, a notepad, or the like) and/or a product brand (e.g., aCoca-Cola® bottle instead of a Sprite® bottle, a Starbucks® coffeetumbler instead of a Tervis® coffee tumbler, or the like). Productfacing direction and/or orientation, for example, may be determinedbased on a detected orientation of an asymmetric shape of a product baseusing pressure sensitive pads, detected density of products, etc. Forexample, the product facing may be determined based on locations ofdetected product bases relative to certain areas of a shelf (e.g., alonga front edge of a shelf), etc. Product facing may also be determinedusing image sensors, light sensors, or any other sensor suitable fordetecting product orientation.

The one or more processors may generate one or more indicators of theone or more aspects of planogram compliance. For example, an indicatormay comprise a data packet, a data file, or any other data structureindicating any variations from a planogram, e.g., with respect toproduct placement such as encoding intended coordinates of a product andactual coordinates on the shelf, with respect to product facingdirection and/or orientation such as encoding indicators of locationsthat have products not facing a correct direction and/or in an undesiredorientation, or the like.

In addition to or as an alternative to determining planogram compliance,the one or more processors may detect a change in measurements from oneor more detection elements. Such measurement changes may trigger aresponse. For example, a change of a first type may trigger capture ofat least one image of the shelf (e.g., using capturing devices 125 ofFIGS. 4A-4C explained above) while a detected change of a second typemay cause the at least one processor to forgo such capture. A first typeof change may, for example, indicate the moving of a product from onelocation on the shelf to another location such that planogram compliancemay be implicated. In such cases, it may be desired to capture an imageof the product rearrangement in order to assess or reassess productplanogram compliance. In another example, a first type of change mayindicate the removal of a product from the shelf, e.g., by an employeedue to damage, by a customer to purchase, or the like. On the otherhand, a second type of change may, for example, indicate the removal andreplacement of a product to the same (within a margin of error) locationon the shelf, e.g., by a customer to inspect the item. In cases whereproducts are removed from a shelf, but then replaced on the shelf (e.g.,within a particular time window), the system may forgo a new imagecapture, especially if the replaced product is detected in a locationsimilar to or the same as its recent, original position.

With reference to FIG. 8B and consistent with the present disclosure, astore shelf 850 may include a plurality of detection elements, e.g.,detection elements 851A and 851B. In the example of FIG. 8B, detectionelements 851A and 851B may comprise light sensors and/or other sensorsmeasuring one or more parameters (such as visual or otherelectromagnetic reflectance, visual or other electromagnetic emittance,or the like) based on electromagnetic waves from products, e.g., product853A and product 853B. Additionally or alternatively, as explained abovewith respect to FIG. 8B, detection elements 851A and 851B may comprisepressure sensors, other sensors measuring one or more parameters (suchas resistance, capacitance, or the like) based on physical contact (orlack thereof) with the products, and/or other sensors that measure oneor more parameters (such as current induction, magnetic induction,visual or other electromagnetic reflectance, visual or otherelectromagnetic emittance, or the like) based on physical proximity (orlack thereof) to products.

Moreover, although depicted as located on shelf 850, some detectionelements may be located next to shelf 850 (e.g., for magnetometers orthe like), across from shelf 850 (e.g., for image sensors or other lightsensors, light detection and ranging (LIDAR) sensors, radio detectionand ranging (RADAR) sensors, or the like), above shelf 850 (e.g., foracoustic sensors or the like), below shelf 850 (e.g., for pressuresensors, light detectors, or the like), or any other appropriate spatialarrangement. Further, although depicted as standalone in the example ofFIG. 8B, the plurality of detection elements may form part of a fabric(e.g., a smart fabric or the like), and the fabric may be positioned ona shelf to take measurements.

Detection elements associated with shelf 850 may be associated withdifferent areas of shelf 850, e.g., area 855A, area 855B, or the like.Although depicted as rows, areas 855A and 855B may comprise any areas ofshelf 850, whether contiguous (e.g., a square, a rectangular, or otherregular or irregular shape) or not (e.g., a plurality of rectangles orother regular and/or irregular shapes).

One or more processors (e.g., processing device 202) in communicationwith the detection elements (e.g., detection elements 851A and 851B) maydetect first signals associated with a first area and second signalsassociated with a second area. Any of the processing of the first andsecond signals described above with respect to FIG. 8A may similarly beperformed for the configuration of FIG. 8B.

In both FIGS. 8A and 8B, the detection elements may be integral to theshelf, part of a fabric or other surface configured for positioning onthe shelf, or the like. Power and/or data cables may form part of theshelf, the fabric, the surface, or be otherwise connected to thedetection elements. Additionally or alternatively, as depicted in FIGS.8A and 8B, individual sensors may be positioned on the shelf. Forexample, the power and/or data cables may be positioned under the shelfand connected through the shelf to the detection elements. In anotherexample, power and/or data may be transmitted wirelessly to thedetection elements (e.g., to wireless network interface controllersforming part of the detection elements). In yet another example, thedetection elements may include internal power sources (such as batteriesor fuel cells).

With reference to FIG. 9 and consistent with the present disclosure, thedetection elements described above with reference to FIGS. 8A and 8B maybe arranged on rows of the shelf in any appropriate configuration. Allof the arrangements of FIG. 9 are shown as a top-down view of a row(e.g., area 805A, area 805B, area 855A, area 855B, or the like) on theshelf. For example, arrangements 910 and 940 are both uniformdistributions of detection elements within a row. However, arrangement910 is also uniform throughout the depth of the row while arrangement940 is staggered. Both arrangements may provide signals that representproducts on the shelf in accordance with spatially uniform measurementlocations. As further shown in FIG. 9, arrangements 920, 930, 950, and960 cluster detection elements near the front (e.g., a facing portion)of the row. Arrangement 920 includes detection elements at a frontportion while arrangement 930 includes defection elements in a largerportion of the front of the shelf. Such arrangements may save power andprocessing cycles by having fewer detection elements on a back portionof the shelf. Arrangements 950 and 960 include some detection elementsin a back portion of the shelf but these elements are arranged lessdense than detection elements in the front. Such arrangements may allowfor detections in the back of the shelf (e.g., a need to restockproducts, a disruption to products in the back by a customer oremployee, or the like) while still using less power and fewer processingcycles than arrangements 910 and 940. Such arrangements may include ahigher density of detection elements in regions of the shelf (e.g., afront edge of the shelf) where product turnover rates may be higher thanin other regions (e.g., at areas deeper into a shelf), and/or in regionsof the shelf where planogram compliance is especially important.

FIG. 10A is a flow chart, illustrating an exemplary method 1000 formonitoring planogram compliance on a store shelf, in accordance with thepresently disclosed subject matter. It is contemplated that method 1000may be used with any of the detection element arrays discussed abovewith reference to, for example, FIGS. 8A, 8B and 9. The order andarrangement of steps in method 1000 is provided for purposes ofillustration. As will be appreciated from this disclosure, modificationsmay be made to process 1000, for example, adding, combining, removing,and/or rearranging one or more steps of process 1000.

Method 1000 may include a step 1005 of receiving first signals from afirst subset of detection elements (e.g., detection elements 801A and801B of FIG. 8A) from among the plurality of detection elements afterone or more of a plurality of products (e.g., products 803A and 803B)are placed on at least one area of the store shelf associated with thefirst subset of detection elements. As explained above with respect toFIGS. 8A and 8B, the plurality of detection elements may be embeddedinto a fabric configured to be positioned on the store shelf.Additionally or alternatively, the plurality of detection elements maybe configured to be integrated with the store shelf. For example, anarray of pressure sensitive elements (or any other type of detector) maybe fabricated as part of the store shelf. In some examples, theplurality of detection elements may be configured to placed adjacent to(or located on) store shelves, as described above.

As described above with respect to arrangements 910 and 940 of FIG. 9,the plurality of detection elements may be substantially uniformlydistributed across the store shelf. Alternatively, as described abovewith respect to arrangements 920, 930, 950, and 960 of FIG. 9, theplurality of detection elements may be distributed relative to the storeshelf such that a first area of the store shelf has a higher density ofdetection elements than a second area of the store shelf. For example,the first area may comprise a front portion of the shelf, and the secondarea may comprise a back portion of the shelf.

In some embodiments, such as those including pressure sensors or othercontact sensors as depicted in the example of FIG. 8A, step 1005 mayinclude receiving the first signals from the first subset of detectionelements as the plurality of products are placed above the first subsetof detection elements. In some embodiments where the plurality ofdetection elements includes pressure detectors, the first signals may beindicative of pressure levels detected by pressure detectorscorresponding to the first subset of detection elements after one ormore of the plurality of products are placed on the at least one area ofthe store shelf associated with the first subset of detection elements.For example, the first signals may be indicative of pressure levelsdetected by pressure detectors corresponding to the first subset ofdetection elements after stocking at least one additional product abovea product previously positioned on the shelf, removal of a product fromthe shelf, or the like. In other embodiments where the plurality ofdetection elements includes light detectors, the first signals may beindicative of light measurements made with respect to one or more of theplurality of products placed on the at least one area of the store shelfassociated with the first subset of detection elements. Specifically,the first signals may be indicative of at least part of the ambientlight being blocked from reaching the light detectors by the one or moreof the plurality of products.

In embodiments including proximity sensors as depicted in the example ofFIG. 8B, step 1005 may include receiving the first signals from thefirst subset of detection elements as the plurality of products areplaced below the first subset of detection elements. In embodimentswhere the plurality of detection elements include proximity detectors,the first signals may be indicative of proximity measurements made withrespect to one or more of the plurality of products placed on the atleast one area of the store shelf associated with the first subset ofdetection elements.

Method 1000 may include step 1010 of using the first signals to identifyat least one pattern associated with a product type of the plurality ofproducts. For example, any of the pattern matching techniques describedabove with respect to FIGS. 8A and 8B may be used for identification. Apattern associated with a product type may include a pattern (e.g., acontinuous ring, a discontinuous ring of a certain number of points, acertain shape, etc.) associated with a base of a single product. Thepattern associated with a product type may also be formed by a group ofproducts. For example, a six pack of soda cans may be associated with apattern including a 2×3 array of continuous rings associated with thesix cans of that product type. Additionally, a grouping of two literbottles may form a detectable pattern including an array (whetheruniform, irregular, or random) of discontinuous rings of pressurepoints, where the rings have a diameter associated with a particular2-liter product. Various other types of patterns may also be detected(e.g., patterns associated with different product types arrangedadjacent to one another, patterns associated with solid shapes (such asa rectangle of a boxed product), and so forth). In another example, anartificial neural network configured to recognize product types may beused to analyze the signals received by step 1005 (such as signals frompressure sensors, from light detectors, from contact sensors, and soforth) to determine product types associated with products placed on anarea of a shelf (such as an area of a shelf associated with the firstsubset of detection elements). In yet another example, a machinelearning algorithm trained using training examples to recognize producttypes may be used to analyze the signals received by step 1005 (such assignals from pressure sensors, from light detectors, from contactsensors, and so forth) to determine product types associated withproducts placed on an area of a shelf (such as an area of a shelfassociated with the first subset of detection elements).

In some embodiments, step 1010 may further include accessing a memorystoring data (e.g., memory device 226 of FIG. 2 and/or memory device 314of FIG. 3A) associated with patterns of different types of products. Insuch embodiments, step 1010 may include using the first signals toidentify at least one product of a first type using a first pattern (ora first product model) and at least one product of a second type using asecond pattern (or a second product model). For example, the first typemay include one brand (such as Coca-Cola® or Folgers®) while the secondtype may include another brand (such as Pepsi® or Maxwell House®). Inthis example, a size, shape, point spacing, weight, resistance or otherproperty of the first brand may be different from that of the secondbrand such that the detection elements may differentiate the brands.Such characteristics may also be used to differentiate like-branded, butdifferent products from one another (e.g., a 12-ounce can of Coca Cola,versus a 16 oz bottle of Coca Cola, versus a 2-liter bottle of CocaCola). For example, a soda may have a base detectable by a pressuresensitive pad as a continuous ring. Further, the can of soda may beassociated with a first weight signal having a value recognizable asassociated with such a product. A 16 ounce bottle of soda may beassociated with a base having four or five pressure points, which apressure sensitive pad may detect as arranged in a pattern associatedwith a diameter typical of such a product. The 16 ounce bottle of sodamay also be associated with a second weight signal having a value higherthan the weight signal associated with the 12 ounce can of soda. Furtherstill, a 2 liter bottle of soda may be associated with a base having aring, four or five pressure points, etc. that a pressure sensitive padmay detect as arranged in a pattern associated with a diameter typicalof such a product. The 2 liter bottle of soda may be associated with aweight signal having a value higher than the weight signal associatedwith the 12 ounce can of soda and 16 ounce bottle of soda.

In the example of FIG. 8B, the different bottoms of product 853A andproduct 853B may be used to differentiate the products from each other.For example, detection elements such as pressure sensitive pads may beused to detect a product base shape and size (e.g., ring, pattern ofpoints, asymmetric shape, base dimensions, and so forth). Such a baseshape and size may be used (optionally, together with one or more weightsignals) to identify a particular product. The signals may also be usedto identify and/or distinguish product types from one another. Forexample, a first type may include one category of product (such as sodacans) while a second type may include a different category of product(such as notepads). In another example, detection elements such as lightdetectors may be used to detect a product based on a pattern of lightreadings indicative of a product blocking at least part of the ambientlight from reaching the light detectors. Such pattern of light readingsmay be used to identify product type and/or product category and/orproduct shape. For example, products of a first type may block a firstsubset of light frequencies of the ambient light from reaching the lightdetectors, while products of a second type may block a second subset oflight frequencies of the ambient light from reaching the light detectors(the first subset and second subset may differ). In this case, the typeof the products may be determined based on the light frequenciesreaching the light detectors. In another example, products of a firsttype may have a first shape of shades and therefore may block ambientlight from reaching light detectors arranged in one shape, whileproducts of a second type may have a second shape of shades andtherefore may block ambient light from reaching light detectors arrangedin another shape. In this case, the type of the products may bedetermined based on the shape of blocked ambient light. Any of thepattern matching techniques described above may be used for theidentification.

Additionally or alternatively, step 1010 may include using the at leastone pattern to determine a number of products placed on the at least onearea of the store shelf associated with the first subset of detectionelements. For example, any of the pattern matching techniques describedabove may be used to identify the presence of one or more product typesand then to determine the number of products of each product type (e.g.,by detecting a number of similarly sized and shaped product bases andoptionally by detecting weight signals associated with each detectedbase). In another example, an artificial neural network configured todetermine the number of products of selected product types may be usedto analyze the signals received by step 1005 (such as signals frompressure sensors, from light detectors, from contact sensors, and soforth) to determine the number of products of selected product typesplaced on an area of a shelf (such as an area of a shelf associated withthe first subset of detection elements). In yet another example, amachine learning algorithm trained using training examples to determinethe number of products of selected product types may be used to analyzethe signals received by step 1005 (such as signals from pressuresensors, from light detectors, from contact sensors, and so forth) todetermine the number of products of selected product types placed on anarea of a shelf (such as an area of a shelf associated with the firstsubset of detection elements). Additionally or alternatively, step 1010may include extrapolating from a stored pattern associated with a singleproduct (or type of product) to determine the number of productsmatching the first signals. In such embodiments, step 1010 may furtherinclude determining, for example based on product dimension data storedin a memory, a number of additional products that can be placed on theat least one area of the store shelf associated with the second subsetof detection elements. For example, step 1010 may include extrapolatingbased on stored dimensions of each product and stored dimensions of theshelf area to determine an area and/or volume available for additionalproducts. Step 1010 may further include extrapolation of the number ofadditional products based on the stored dimensions of each product anddetermined available area and/or volume.

Method 1000 may include step 1015 of receiving second signals from asecond subset of detection elements (e.g., detection elements 851A and851B of FIG. 8B) from among the plurality of detection elements, thesecond signals being indicative of no products being placed on at leastone area of the store shelf associated with the second subset ofdetection elements. Using this information, method 1000 may include step1020 of using the second signals to determine at least one empty spaceon the store shelf. For example, any of the pattern matching techniquesdescribed above may be used to determine that the second signals includedefault values or other values indicative of a lack of product incertain areas associated with a retail store shelf. A default value maybe include, for example, a pressure signal associated with an un-loadedpressure sensor or pressure sensitive mat, indicating that no product islocated in a certain region of a shelf. In another example, a defaultvalue may include signals from light detectors corresponding to ambientlight, indicating that no product is located in a certain region of ashelf.

Method 1000 may include step 1025 of determining, based on the at leastone pattern associated with a detected product and the at least oneempty space, at least one aspect of planogram compliance. As explainedabove with respect to FIGS. 8A and 8B, the aspect of planogramcompliance may include the presence or absence of particular products(or brands), locations of products on the shelves, quantities ofproducts within particular areas (e.g., identifying stacked or clusteredproducts), facing directions associated with the products (e.g., whethera product is outward facing, inward facing, askew, or the like), or thelike. A planogram compliance determination may be made, for example, bydetermining a number of empty spaces on a shelf and determining alocation of the empty spaces on a shelf. The planogram determination mayalso include determining weight signal magnitudes associated withdetected products at the various detected non-empty locations. Thisinformation may be used by the one or more processors in determiningwhether a product facing specification has been satisfied (e.g., whethera front edge of a shelf has a suitable number of products or suitabledensity of products), whether a specified stacking density has beenachieved (e.g., by determining a pattern of detected products and weightsignals of the detected products to determine how many products arestacked at each location), whether a product density specification hasbeen achieved (e.g., by determining a ratio of empty locations toproduct-present locations), whether products of a selected product typeare located in a selected area of the shelf, whether all productslocated in a selected area of the shelf are of a selected product type,whether a selected number of products (or a selected number of productsof a selected product type) are located in a selected area of the shelf,whether products located in a selected area of a shelf are positioned ina selected orientation, or whether any other aspect of one or moreplanograms has been achieved.

For example, the at least one aspect may include product homogeneity,and step 1025 may further include counting occurrences where a productof the second type is placed on an area of the store shelf associatedwith the first type of product. For example, by accessing a memoryincluding base patterns (or any other type of pattern associated withproduct types, such as product models), the at least one processor maydetect different products and product types. A product of a first typemay be recognized based on a first pattern, and product of a second typemay be recognized based on a second, different pattern (optionally alsobased on weight signal information to aid in differentiating betweenproducts). Such information may be used, for example, to monitor whethera certain region of a shelf includes an appropriate or intended productor product type. Such information may also be useful in determiningwhether products or product types have been mixed (e.g., producthomogeneity). Regarding planogram compliance, detection of differentproducts and their relative locations on a shelf may aid in determiningwhether a product homogeneity value, ratio, etc. has been achieved. Forexample, the at least one processor may count occurrences where aproduct of a second type is placed on an area of the store shelfassociated with a product of a first type.

Additionally or alternatively, the at least one aspect of planogramcompliance may include a restocking rate, and step 1025 may furtherinclude determining the restocking rate based on a sensed rate at whichproducts are added to the at least one area of the store shelfassociated with the second subset of detection elements. Restocking ratemay be determined, for example, by monitoring a rate at which detectionelement signals change as products are added to a shelf (e.g., whenareas of a pressure sensitive pad change from a default value to aproduct-present value).

Additionally or alternatively, the at least one aspect of planogramcompliance may include product facing, and step 1025 may further includedetermining the product facing based on a number of products determinedto be placed on a selected area of the store shelf at a front of thestore shelf. Such product facing may be determined by determining anumber of products along a certain length of a front edge of a storeshelf and determining whether the number of products complies with, forexample, a specified density of products, a specified number ofproducts, and so forth.

Step 1025 may further include transmitting an indicator of the at leastone aspect of planogram compliance to a remote server. For example, asexplained above with respect to FIGS. 8A and 8B, the indicator maycomprise a data packet, a data file, or any other data structureindicating any variations from a planogram, e.g., with respect toproduct (or brand) placement, product facing direction, or the like. Theremote server may include one or more computers associated with a retailstore (e.g., so planogram compliance may be determined on a local basiswithin a particular store), one or more computers associated with aretail store evaluation body (e.g., so planogram compliance may bedetermined across a plurality of retail stores), one or more computersassociated with a product manufacturer, one or more computers associatedwith a supplier (such as supplier 115), one or more computers associatedwith a market research entity (such as market research entity 110), etc.

Method 1000 may further include additional steps. For example, method1000 may include identifying a change in at least one characteristicassociated with one or more of the first signals (e.g., signals from afirst group or type of detection elements), and in response to theidentified change, triggering an acquisition of at least one image ofthe store shelf. The acquisition may be implemented by activating one ormore of capturing devices 125 of FIGS. 4A-4C, as explained above. Forexample, the change in at least one characteristic associated with oneor more of the first signals may be indicative of removal of at leastone product from a location associated with the at least one area of thestore shelf associated with the first subset of detection elements.Accordingly, method 1000 may include triggering the acquisition todetermine whether restocking, reorganizing, or other intervention isrequired, e.g., to improve planogram compliance. Thus, method 1000 mayinclude identifying a change in at least one characteristic associatedwith one or more of the first signals; and in response to the identifiedchange, trigger a product-related task for an employee of the retailstore.

Additionally or alternatively, method 1000 may be combined with method1050 of FIG. 10B, described below, such that step 1055 is performed anytime after step 1005.

FIG. 10B is a flow chart, illustrating an exemplary method 1050 fortriggering image capture of a store shelf, in accordance with thepresently disclosed subject matter. It is contemplated that method 1050may be used in conjunction with any of the detection element arraysdiscussed above with reference to, for example, FIGS. 8A, 8B and 9. Theorder and arrangement of steps in method 1050 is provided for purposesof illustration. As will be appreciated from this disclosure,modifications may be made to process 1050, for example, adding,combining, removing, and/or rearranging one or more steps of process1050.

Method 1050 may include a step 1055 of determining a change in at leastone characteristic associated with one or more first signals. Forexample, the first signals may have been captured as part of method 1000of FIG. 10A, described above. For example, the first signals may includepressure readings when the plurality of detection elements includespressure sensors, contact information when the plurality of detectionelements includes contact sensors, light readings when the plurality ofdetection elements includes light detectors (for example, from lightdetectors configured to be placed adjacent to (or located on) a surfaceof a store shelf configured to hold products, as described above), andso forth.

Method 1050 may include step 1060 of using the first signals to identifyat least one pattern associated with a product type of the plurality ofproducts. For example, any of the pattern matching techniques describedabove with respect to FIGS. 8A, 8B, and step 1010 may be used foridentification.

Method 1050 may include step 1065 of determining a type of eventassociated with the change. For example, a type of event may include aproduct removal, a product placement, movement of a product, or thelike.

Method 1050 may include step 1070 of triggering an acquisition of atleast one image of the store shelf when the change is associated with afirst event type. For example, a first event type may include removal ofa product, moving of a product, or the like, such that the first eventtype may trigger a product-related task for an employee of the retailstore depending on analysis of the at least one image The acquisitionmay be implemented by activating one or more of capturing devices 125 ofFIGS. 4A-4C, as explained above. In some examples, the triggeredacquisition may include an activation of at least one projector (such asprojector 632). In some examples, the triggered acquisition may includeacquisition of color images, depth images, stereo images, active stereoimages, time of flight images, LIDAR images, RADAR images, and so forth.

Method 1050 may include a step (not shown) of forgoing the acquisitionof at least one image of the store shelf when the change is associatedwith a second event type. For example, a second event type may includereplacement of a removed product by a customer, stocking of a shelf byan employee, or the like. As another example, a second event type mayinclude removal, placement, or movement of a product that is detectedwithin a margin of error of the detection elements and/or detectedwithin a threshold (e.g., removal of only one or two products; movementof a product by less than 5 cm, 20 cm, or the like; moving of a facingdirection by less than 10 degrees; or the like), such that no imageacquisition is required.

FIGS. 11A-11E illustrate example outputs based on data automaticallyderived from machine processing and analysis of images captured inretail store 105 according to disclosed embodiments. FIG. 11Aillustrates an optional output for market research entity 110. FIG. 11Billustrates an optional output for supplier 115. FIGS. 11C and 11Dillustrate optional outputs for employees of retail store 105. FIG. 11Eillustrates optional outputs for user 120.

FIG. 11A illustrates an example graphical user interface (GUI) 500 foroutput device 145A, representative of a GUI that may be used by marketresearch entity 110. Consistent with the present disclosure, marketresearch entity 110 may assist supplier 115 and other stakeholders inidentifying emerging trends, launching new products, and/or developingmerchandising and distribution plans across a large number of retailstores 105. By doing so, market research entity 110 may assist supplier115 in growing product presence and maximizing or increasing new productsales. As mentioned above, market research entity 110 may be separatedfrom or part of supplier 115. To successfully launch a new product,supplier 115 may use information about what really happens in retailstore 105. For example, supplier 115 may want to monitor how marketingplans are being executed and to learn what other competitors are doingrelative to certain products or product types. Embodiments of thepresent disclosure may allow market research entity 110 and suppliers115 to continuously monitor product-related activities at retail stores105 (e.g., using system 100 to generate various metrics or informationbased on automated analysis of actual, timely images acquired from theretail stores). For example, in some embodiments, market research entity110 may track how quickly or at what rate new products are introduced toretail store shelves, identify new products introduced by variousentities, assess a supplier's brand presence across different retailstores 105, among many other potential metrics.

In some embodiments, server 135 may provide market research entity 110with information including shelf organization, analysis of skewproductivity trends, and various reports aggregating information onproducts appearing across large numbers of retail stores 105. Forexample, as shown in FIG. 11A, GUI 1100 may include a first display area1102 for showing a percentage of promotion campaign compliance indifferent retail stores 105. GUI 1100 may also include a second displayarea 1104 showing a graph illustrating sales of a certain productrelative to the percentage of out of shelf. GUI 1100 may also include athird display area 1106 showing actual measurements of different factorsrelative to target goals (e.g., planogram compliance, restocking rate,price compliance, and other metrics). The provided information mayenable market research entity 110 to give supplier 115 informed shelvingrecommendations and fine-tune promotional strategies according toin-store marketing trends, to provide store managers with a comparisonof store performances in comparison to a group of retail stores 105 orindustry wide performances, and so forth.

FIG. 11B illustrates an example GUI 1110 for output device 145B used bysupplier 115. Consistent with the present disclosure, server 135 may usedata derived from images captured in a plurality of retail stores 105 torecommend a planogram, which often determines sales success of differentproducts. Using various analytics and planogram productivity measures,server 135 may help supplier 115 to determine an effective planogramwith assurances that most if not all retail stores 105 can execute theplan. For example, the determined planogram may increase the probabilitythat inventory is available for each retail store 105 and may bedesigned to decrease costs or to keep costs within a budget (such asinventory costs, restocking costs, shelf space costs, and so forth).Server 135 may also provide pricing recommendations based on the goalsof supplier 115 and other factors. In other words, server 135 may helpsupplier 115 understand how much room to reserve for different productsand how to make them available for favorable sales and profit impact(for example, by choosing the size of the shelf dedicated to a selectedproduct, the location of the shelf, the height of the shelf, theneighboring products, and so forth). In addition, server 135 may monitornear real-time data from retail stores 105 to determine or confirm thatretail stores 105 are compliant with the determined planogram ofsupplier 115. As used herein, the term “near real-time data,” in thecontext of this disclosure, refers to data acquired or generated, etc.,based on sensor readings and other inputs (such as data from imagesensors, audio sensors, pressure sensors, checkout stations, etc.) fromretail store 105 received by system 100 within a predefined period oftime (such as time periods having durations of less than a second, lessthan a minute, less than an hour, less than a day, less than a week, andso forth).

In some embodiments, server 135 may generate reports that summarizeperformance of the current assortment and the planogram compliance.These reports may advise supplier 115 of the category and the itemperformance based on individual SKU, sub segments of the category,vendor, and region. In addition, server 135 may provide suggestions orinformation upon which decisions may be made regarding how or when toremove markdowns and when to replace underperforming products. Forexample, as shown in FIG. 11B, GUI 1110 may include a first display area1112 for showing different scores of supplier 115 relative to scoresassociated with its competitors. GUI 1110 may also include a seconddisplay area 1114 showing the market share of each competitor. GUI 1110may also include a third display area 1116 showing retail measurementsand distribution of brands. GUI 1110 may also include a fourth displayarea 1118 showing a suggested planogram. The provided information mayhelp supplier 115 to select preferred planograms based on projected orobserved profitability, etc., and to ensure that retail stores 105 arefollowing the determined planogram.

FIGS. 11C and 11D illustrate example GUIs for output devices 145C, whichmay be used by employees of retail store 105. FIG. 11C depicts a GUI1120 for a manager of retail store 105 designed for a desktop computer,and FIG. 11D depicts GUI 1130 and 1140 for store staff designed for ahandheld device. In-store execution is one of the challenges retailstores 105 have in creating a positive customer experience. Typicalin-store execution may involve dealing with ongoing service events, suchas a cleaning event, a restocking event, a rearrangement event, andmore. In some embodiments, system 100 may improve in-store execution byproviding adequate visibility to ensure that the right products arelocated at preferred locations on the shelf. For example, using nearreal-time data (e.g., captured images of store shelves) server 135 maygenerate customized online reports. Store managers and regionalmanagers, as well as other stakeholders, may access custom dashboardsand online reports to see how in-store conditions (such as, planogramcompliance, promotion compliance, price compliance, etc.) are affectingsales. This way, system 100 may enable managers of retail stores 105 tostay on top of burning issues across the floor and assign employees toaddress issues that may negatively impact the customer experience.

In some embodiments, server 135 may cause real-time automated alertswhen products are out of shelf (or near out of shelf), when pricing isinaccurate, when intended promotions are absent, and/or when there areissues with planogram compliance, among others. In the example shown inFIG. 11C, GUI 1120 may include a first display area 1122 for showing theaverage scores (for certain metrics) of a specific retail store 105 overa selected period of time. GUI 1120 may also include a second displayarea 1124 for showing a map of the specific retail store 105 withreal-time indications of selected in-store execution events that requireattention, and a third display area 1126 for showing a list of theselected in-store execution events that require attention. In anotherexample, shown in FIG. 11D, GUI 1130 may include a first display area1132 for showing a list of notifications or text messages indicatingselected in-store execution events that require attention. Thenotifications or text messages may include a link to an image (or theimage itself) of the specific aisle with the in-store execution event.In another example, shown in FIG. 11D, GUI 1140 may include a firstdisplay area 1142 for showing a display of a video stream captured byoutput device 145C (e.g., a real-time display or a near real-timedisplay) with augmented markings indicting a status of planogramcompliance for each product (e.g., correct place, misplaced, not inplanogram, empty, and so forth). GUI 1140 may also include a seconddisplay area 1144 for showing a summary of the planogram compliance forall the products identified in the video stream captured by outputdevice 145C. Consistent with the present disclosure, server 135 maygenerate within minutes actionable tasks to improve store execution.These tasks may help employees of retail store 105 to quickly addresssituations that can negatively impact revenue and customer experience inthe retail store 105.

FIG. 11E illustrates an example GUI 1150 for output device 145D used byan online customer of retail store 105. Traditional online shoppingsystems present online customers with a list of products. Productsselected for purchase may be placed into a virtual shopping cart untilthe customers complete their virtual shopping trip. Virtual shoppingcarts may be examined at any time, and their contents can be edited ordeleted. However, common problems of traditional online shopping systemsarise when the list of products on the website does not correspond withthe actual products on the shelf. For example, an online customer mayorder a favorite cookie brand without knowing that the cookie brand isout-of-stock. Consistent with some embodiments, system 100 may use imagedata acquired by capturing devices 125 to provide the online customerwith a near real-time display of the retail store and a list of theactual products on the shelf based on near real-time data. In oneembodiment, server 135 may select images without occlusions in the fieldof view (e.g., without other customers, carts, etc.) for the nearreal-time display. In one embodiment, server 135 may blur or erasedepictions of customers and other people from the near real-timedisplay. As used herein, the term “near real-time display,” in thecontext of this disclosure, refers to image data captured in retailstore 105 that was obtained by system 100 within a predefined period oftime (such as less than a second, less than a minute, less than about 30minutes, less than an hour, less than 3 hours, or less than 12 hours)from the time the image data was captured.

Consistent with the present disclosure, the near real-time display ofretail store 105 may be presented to the online customer in a mannerenabling easy virtual navigation in retail store 105. For example, asshown in FIG. 11E, GUI 1150 may include a first display area 1152 forshowing the near real-time display and a second display area 1154 forshowing a product list including products identified in the nearreal-time display. In some embodiments, first display area 1152 mayinclude different GUI features (e.g., tabs 1156) associated withdifferent locations or departments of retail store 105. By selectingeach of the GUI features, the online customer can virtually jump todifferent locations or departments in retail store 105. For example,upon selecting the “bakery” tab, GUI 1150 may present a near real-timedisplay of the bakery of retail store 105. In addition, first displayarea 1152 may include one or more navigational features (e.g., arrows1158A and 1158B) for enabling the online customer to virtually movewithin a selected department and/or virtually walk through retail store105. Server 135 may be configured to update the near real-time displayand the product list upon determining that the online customer wants tovirtually move within retail store 105. For example, after identifying aselection of arrow 1158B, server 135 may present a different section ofthe dairy department and may update the product list accordingly.

In another example, server 135 may update the near-real time display andthe product list in response to new captured images and new informationreceived from retail store 105. Using GUI 1150, the online customer mayhave the closest shopping experience without actually being in retailstore 105. For example, an online customer can visit the vegetabledepartment and decide not to buy tomatoes after seeing that they are notripe enough.

In some embodiments, a method, such as methods 700, 720, 1000, 1050,1200, 1300, 1400, 1500 and 1600 may comprise one or more steps. In someexamples, these methods, as well as all individual steps therein, may beperformed by various aspects of capturing device 125, server 135, acloud platform, a computational node, and so forth. For example, asystem comprising of at least one processor, such as processing device202 and/or processing device 302, may perform any of these methods aswell as all individual steps therein, for example by processing device202 and/or processing device 302 executing software instructions storedwithin memory device 226 and/or memory device 314. In some examples,these methods, as well as all individual steps therein, may be performedby a dedicated hardware. In some examples, computer readable medium,such as a non-transitory computer readable medium, may store data and/orcomputer implementable instructions for carrying out any of thesemethods as well as all individual steps therein. Some non-limitingexamples of possible execution manners of a method may includecontinuous execution (for example, returning to the beginning of themethod once the method normal execution ends), periodically execution,executing the method at selected times, execution upon the detection ofa trigger (some non-limiting examples of such trigger may include atrigger from a user, a trigger from another process, a trigger from anexternal device, etc.), and so forth.

In some embodiments, machine learning algorithms (also referred to asmachine learning models in the present disclosure) may be trained usingtraining examples, for example by Step 1010, Step 1204, Step 1208, Step1210, Step 1304, Step 1306, Step 1404, Step 1406, Step 1506 and Step1606, and in the cases described herein. Some non-limiting examples ofsuch machine learning algorithms may include classification algorithms,data regressions algorithms, image segmentation algorithms, visualdetection algorithms (such as object detectors, face detectors, persondetectors, motion detectors, edge detectors, etc.), visual recognitionalgorithms (such as face recognition, person recognition, objectrecognition, etc.), speech recognition algorithms, mathematicalembedding algorithms, natural language processing algorithms, supportvector machines, random forests, nearest neighbors algorithms, deeplearning algorithms, artificial neural network algorithms, convolutionalneural network algorithms, recurrent neural network algorithms, linearmachine learning models, non-linear machine learning models, ensemblealgorithms, and so forth. For example, a trained machine learningalgorithm may comprise an inference model, such as a predictive model, aclassification model, a data regression model, a clustering model, asegmentation model, an artificial neural network (such as a deep neuralnetwork, a convolutional neural network, a recurrent neural network,etc.), a random forest, a support vector machine, and so forth. In someexamples, the training examples may include example inputs together withthe desired outputs corresponding to the example inputs. Further, insome examples, training machine learning algorithms using the trainingexamples may generate a trained machine learning algorithm, and thetrained machine learning algorithm may be used to estimate outputs forinputs not included in the training examples. In some examples,engineers, scientists, processes and machines that train machinelearning algorithms may further use validation examples and/or testexamples. For example, validation examples and/or test examples mayinclude example inputs together with the desired outputs correspondingto the example inputs, a trained machine learning algorithm and/or anintermediately trained machine learning algorithm may be used toestimate outputs for the example inputs of the validation examplesand/or test examples, the estimated outputs may be compared to thecorresponding desired outputs, and the trained machine learningalgorithm and/or the intermediately trained machine learning algorithmmay be evaluated based on a result of the comparison. In some examples,a machine learning algorithm may have parameters and hyper parameters,where the hyper parameters may be set manually by a person orautomatically by an process external to the machine learning algorithm(such as a hyper parameter search algorithm), and the parameters of themachine learning algorithm may be set by the machine learning algorithmbased on the training examples. In some implementations, thehyper-parameters may be set based on the training examples and thevalidation examples, and the parameters may be set based on the trainingexamples and the selected hyper-parameters. For example, given thehyper-parameters, the parameters may be conditionally independent of thevalidation examples.

In some embodiments, trained machine learning algorithms (also referredto as machine learning models and trained machine learning models in thepresent disclosure) may be used to analyze inputs and generate outputs,for example by Step 1010, Step 1204, Step 1208, Step 1210, Step 1304,Step 1306, Step 1404, Step 1406, Step 1506 and Step 1606, and in thecases described below. In some examples, a trained machine learningalgorithm may be used as an inference model that when provided with aninput generates an inferred output. For example, a trained machinelearning algorithm may include a classification algorithm, the input mayinclude a sample, and the inferred output may include a classificationof the sample (such as an inferred label, an inferred tag, and soforth). In another example, a trained machine learning algorithm mayinclude a regression model, the input may include a sample, and theinferred output may include an inferred value corresponding to thesample. In yet another example, a trained machine learning algorithm mayinclude a clustering model, the input may include a sample, and theinferred output may include an assignment of the sample to at least onecluster. In an additional example, a trained machine learning algorithmmay include a classification algorithm, the input may include an image,and the inferred output may include a classification of an item depictedin the image. In yet another example, a trained machine learningalgorithm may include a regression model, the input may include animage, and the inferred output may include an inferred valuecorresponding to an item depicted in the image (such as an estimatedproperty of the item, such as size, volume, age of a person depicted inthe image, cost of a product depicted in the image, and so forth). In anadditional example, a trained machine learning algorithm may include animage segmentation model, the input may include an image, and theinferred output may include a segmentation of the image. In yet anotherexample, a trained machine learning algorithm may include an objectdetector, the input may include an image, and the inferred output mayinclude one or more detected objects in the image and/or one or morelocations of objects within the image. In some examples, the trainedmachine learning algorithm may include one or more formulas and/or oneor more functions and/or one or more rules and/or one or moreprocedures, the input may be used as input to the formulas and/orfunctions and/or rules and/or procedures, and the inferred output may bebased on the outputs of the formulas and/or functions and/or rulesand/or procedures (for example, selecting one of the outputs of theformulas and/or functions and/or rules and/or procedures, using astatistical measure of the outputs of the formulas and/or functionsand/or rules and/or procedures, and so forth).

In some embodiments, artificial neural networks may be configured toanalyze inputs and generate corresponding outputs, for example by Step1010, Step 1210, Step 1306, Step 1406, Step 1506 and Step 1606, and inthe cases described below. Some non-limiting examples of such artificialneural networks may comprise shallow artificial neural networks, deepartificial neural networks, feedback artificial neural networks, feedforward artificial neural networks, autoencoder artificial neuralnetworks, probabilistic artificial neural networks, time delayartificial neural networks, convolutional artificial neural networks,recurrent artificial neural networks, long short term memory artificialneural networks, and so forth. In some examples, an artificial neuralnetwork may be configured manually. For example, a structure of theartificial neural network may be selected manually, a type of anartificial neuron of the artificial neural network may be selectedmanually, a parameter of the artificial neural network (such as aparameter of an artificial neuron of the artificial neural network) maybe selected manually, and so forth. In some examples, an artificialneural network may be configured using a machine learning algorithm. Forexample, a user may select hyper-parameters for the an artificial neuralnetwork and/or the machine learning algorithm, and the machine learningalgorithm may use the hyper-parameters and training examples todetermine the parameters of the artificial neural network, for exampleusing back propagation, using gradient descent, using stochasticgradient descent, using mini-batch gradient descent, and so forth. Insome examples, an artificial neural network may be created from two ormore other artificial neural networks by combining the two or more otherartificial neural networks into a single artificial neural network.

Some non-limiting examples of image data may include images, grayscaleimages, color images, 2D images, 3D images, videos, 2D videos, 3Dvideos, frames, footages, data derived from other image data, and soforth. In some embodiments, analyzing image data (for example by themethods, steps and modules described herein, such as Step 724, Step1210, Step 1306, Step 1406, Step 1506 and Step 1606) may compriseanalyzing the image data to obtain a preprocessed image data, andsubsequently analyzing the image data and/or the preprocessed image datato obtain the desired outcome. One of ordinary skill in the art willrecognize that the followings are examples, and that the image data maybe preprocessed using other kinds of preprocessing methods. In someexamples, the image data may be preprocessed by transforming the imagedata using a transformation function to obtain a transformed image data,and the preprocessed image data may comprise the transformed image data.For example, the transformed image data may comprise one or moreconvolutions of the image data. For example, the transformation functionmay comprise one or more image filters, such as low-pass filters,high-pass filters, band-pass filters, all-pass filters, and so forth. Insome examples, the transformation function may comprise a nonlinearfunction. In some examples, the image data may be preprocessed bysmoothing at least parts of the image data, for example using Gaussianconvolution, using a median filter, and so forth. In some examples, theimage data may be preprocessed to obtain a different representation ofthe image data. For example, the preprocessed image data may comprise: arepresentation of at least part of the image data in a frequency domain;a Discrete Fourier Transform of at least part of the image data; aDiscrete Wavelet Transform of at least part of the image data; atime/frequency representation of at least part of the image data; arepresentation of at least part of the image data in a lower dimension;a lossy representation of at least part of the image data; a losslessrepresentation of at least part of the image data; a time ordered seriesof any of the above; any combination of the above; and so forth. In someexamples, the image data may be preprocessed to extract edges, and thepreprocessed image data may comprise information based on and/or relatedto the extracted edges. In some examples, the image data may bepreprocessed to extract image features from the image data. Somenon-limiting examples of such image features may comprise informationbased on and/or related to: edges; corners; blobs; ridges; ScaleInvariant Feature Transform (SIFT) features; temporal features; and soforth. In some examples, analyzing the image data may includecalculating at least one convolution of at least a portion of the imagedata, and using the calculated at least one convolution to calculate atleast one resulting value and/or to make determinations,identifications, recognitions, classifications, and so forth.

In some embodiments, analyzing image data (for example by the methods,steps and modules described herein, such as Step 724, Step 1210, Step1306, Step 1406, Step 1506 and Step 1606) may comprise analyzing theimage data and/or the preprocessed image data using one or more rules,functions, procedures, artificial neural networks, object detectionalgorithms, face detection algorithms, visual event detectionalgorithms, action detection algorithms, motion detection algorithms,background subtraction algorithms, inference models, and so forth. Somenon-limiting examples of such inference models may include: an inferencemodel preprogrammed manually; a classification model; a regressionmodel; a result of training algorithms, such as machine learningalgorithms and/or deep learning algorithms, on training examples, wherethe training examples may include examples of data instances, and insome cases, a data instance may be labeled with a corresponding desiredlabel and/or result; and so forth. In some embodiments, analyzing imagedata (for example by the methods, steps and modules described herein,such as Step 724, Step 1210, Step 1306, Step 1406, Step 1506 and Step1606) may comprise analyzing pixels, voxels, point cloud, range data,etc. included in the image data.

Some non-limiting examples of infrared data (also referred to asinfrared input data in the present disclosure) may include any datacaptured using infrared sensors. Some non-limiting examples of infraredsensors may include at least one of active infrared sensors, passiveinfrared sensors, thermal infrared sensors, pyroelectric infraredsensors, thermoelectric infrared sensors, photoconductive infraredsensors, photovoltaic infrared sensors and thermographic cameras. Forexample, an infrared sensor may include a radiation-sensitiveoptoelectronic component with a spectral sensitivity in the infraredwavelength range (780 nm to 50 μm). In some examples, the infrared datamay be or include an infrared image and/or an infrared video, and anytechnique for analyzing image data may be used to analyze the infraredimage and/or the infrared video, including the image analysis techniquesdescribed above. In some examples, the infrared data may be or include atime series data of a plurality of data instances captured usinginfrared sensors and indexed in time order, and any technique foranalyzing time series data may be used to analyze the infrared data. Insome examples, the infrared data may be or include a single measuredvalue, and the analysis of the infrared data may include basing adetermination on the single measured value. In some embodiments,analyzing infrared data (for example by the methods, steps and modulesdescribed herein, such as Step 1204, Step 1208, Step 1404 and Step 1506)may comprise analyzing the infrared data to obtain a preprocessedinfrared data, and subsequently analyzing the infrared data and/or thepreprocessed infrared data to obtain the desired outcome. One ofordinary skill in the art will recognize that the followings areexamples, and that the infrared data may be preprocessed using otherkinds of preprocessing methods. In some examples, the infrared data maybe preprocessed by transforming the infrared data using a transformationfunction to obtain a transformed infrared data, and the preprocessedinfrared data may comprise the transformed infrared data. For example,the transformed infrared data may comprise one or more convolutions ofthe infrared data. For example, the transformation function may compriseat least one of low-pass filters, high-pass filters, band-pass filters,all-pass filters, and so forth. In some examples, the transformationfunction may comprise a nonlinear function. In some examples, theinfrared data may be preprocessed by smoothing at least parts of theinfrared data, for example using Gaussian convolution, using a medianfilter, and so forth. In some examples, the infrared data may bepreprocessed to obtain a different representation of the infrared data.For example, the preprocessed infrared data may comprise: arepresentation of at least part of the infrared data in a lowerdimension; a lossy representation of at least part of the infrared data;a lossless representation of at least part of the infrared data; a timeordered series of any of the above; any combination of the above; and soforth. In some examples, analyzing the infrared data may includecalculating at least one convolution of at least a portion of theinfrared data, and using the calculated at least one convolution tocalculate at least one resulting value and/or to make determinations,identifications, recognitions, classifications, and so forth.

In some embodiments, analyzing infrared data (for example by themethods, steps and modules described herein, such as Step 1204, Step1208, Step 1404 and Step 1506) may comprise analyzing the infrared dataand/or the preprocessed infrared data using one or more rules,functions, procedures, artificial neural networks, object detectionalgorithms, motion detection algorithms, inference models, and so forth.Some non-limiting examples of such inference models may include: aninference model preprogrammed manually; a classification model; aregression model; a result of training algorithms, such as machinelearning algorithms and/or deep learning algorithms, on trainingexamples, where the training examples may include examples of datainstances, and in some cases, a data instance may be labeled with acorresponding desired label and/or result; and so forth.

In some embodiments, infrared data may be captured using one or moreinfrared sensors (for example by the methods, steps and modulesdescribed herein, such as Step 1202, Step 1206, Step 1402 and 1502).Some non-limiting examples of such infrared sensors may include at leastone of active infrared sensors, passive infrared sensors, thermalinfrared sensors, pyroelectric infrared sensors, thermoelectric infraredsensors, photoconductive infrared sensors and photovoltaic infraredsensors. In some examples, at least one of the one or more infraredsensors may be positioned on one side of an aisle fixedly mountedthereon and directed such that they may capture infrared data of themiddle of the aisle and/or of the opposing side of aisle. For example,the at least one of the one or more infrared sensors may be positionedon one side of aisle 400, for example in a similar fashion to capturingdevices 125A, 125B, and 125C as illustrated in FIG. 4A. In someexamples, at least one of the one or more infrared sensors may bepositioned under a retail shelf and/or between two retail shelves. Forexample, the at least one of the one or more infrared sensors may bepositioned under retail shelf 622E, for example in a similar fashion tohousing 5041 as illustrated in FIG. 6B. In another example, the at leastone of the one or more infrared sensors may be positioned between retailshelf 622B and retail shelf 622E, for example in a similar fashion tohousing 5041 as illustrated in FIG. 6B. In yet another example, the atleast one of the one or more infrared sensors may be included in housing5041. In some examples, at least one of the one or more infrared sensorsmay be mounted to a surface of a shelving unit (such as retail shelvingunit 620, a rack of shelves, a unit including multiple shelves mountedto a wall, etc.) that is perpendicular to the shelves (such as a surfaceof the back of a rack, a surface of the wall, etc.).

Some non-limiting examples of vibration data may include any datacaptured using vibration sensors. Some non-limiting examples ofvibration sensors may include at least one of accelerometers,piezoelectric sensors, piezoresistive sensors, capacitive MEMS sensors,displacement sensors, velocity sensors, laser based vibration sensors,and so forth. In some examples, the vibration data may be or include avibration image and/or a vibration video, and any technique foranalyzing image data may be used to analyze the vibration image and/orthe vibration video, including the image analysis techniques describedabove. In some examples, the vibration data may be or include a timeseries data of a plurality of data instances captured using vibrationsensors and indexed in time order, and any technique for analyzing timeseries data may be used to analyze the vibration data. In some examples,the vibration data may be or include a single measured value, and theanalysis of the infrared data may include basing a determination on thesingle measured value. In some embodiments, analyzing vibration data(for example by the methods, steps and modules described herein, such asStep 1304 and Step 1606) may comprise analyzing the vibration data toobtain a preprocessed vibration data, and subsequently analyzing thevibration data and/or the preprocessed vibration data to obtain thedesired outcome. One of ordinary skill in the art will recognize thatthe followings are examples, and that the vibration data may bepreprocessed using other kinds of preprocessing methods. In someexamples, the vibration data may be preprocessed by transforming thevibration data using a transformation function to obtain a transformedvibration data, and the preprocessed vibration data may comprise thetransformed vibration data. For example, the transformed vibration datamay comprise one or more convolutions of the vibration data. Forexample, the transformation function may comprise at least one oflow-pass filters, high-pass filters, band-pass filters, all-passfilters, and so forth. In some examples, the transformation function maycomprise a nonlinear function. In some examples, the vibration data maybe preprocessed by smoothing at least parts of the vibration data, forexample using Gaussian convolution, using a median filter, and so forth.In some examples, the vibration data may be preprocessed to obtain adifferent representation of the vibration data. For example, thepreprocessed vibration data may comprise: a representation of at leastpart of the vibration data in a lower dimension; a lossy representationof at least part of the vibration data; a lossless representation of atleast part of the vibration data; a time ordered series of any of theabove; any combination of the above; and so forth. In some examples,analyzing the vibration data may include calculating at least oneconvolution of at least a portion of the vibration data, and using thecalculated at least one convolution to calculate at least one resultingvalue and/or to make determinations, identifications, recognitions,classifications, and so forth.

In some embodiments, analyzing vibration data (for example by themethods, steps and modules described herein, such as Step 1304 and Step1606) may comprise analyzing the vibration data and/or the preprocessedvibration data using one or more rules, functions, procedures,artificial neural networks, object detection algorithms, motiondetection algorithms, inference models, and so forth. Some non-limitingexamples of such inference models may include: an inference modelpreprogrammed manually; a classification model; a regression model; aresult of training algorithms, such as machine learning algorithmsand/or deep learning algorithms, on training examples, where thetraining examples may include examples of data instances, and in somecases, a data instance may be labeled with a corresponding desired labeland/or result; and so forth.

In some embodiments, vibration data may be captured using one or morevibration sensors (for example by the methods, steps and modulesdescribed herein, such as Step 1302 and Step 1602). Some non-limitingexamples of such vibration sensors may include at least one of anaccelerometer, a piezoelectric sensor, a piezoresistive sensor, acapacitive MEMS sensor, a displacement sensor, a velocity sensor, alaser based vibration sensor, and so forth. In some examples, at leastone of the one or more vibration sensors may be physically connected toat least one retail shelve, for example above the at least one retailshelve, below the at least one retail shelve, to the side of at leastone retail shelve, to an internal part of the at least one retailshelve, and so forth. For example, the at least one of the one or morevibration sensors may be physically connected to retail shelf 622E, forexample in a similar fashion to housing 5041 as illustrated in FIG. 6B.In another example, at least one of the one or more vibration sensorsmay be physically connected to a shelving unit, for example to a part ofthe shelving unit that is not a shelf, for example to a surface of ashelving unit (such as retail shelving unit 620, a rack of shelves, aunit including multiple shelves mounted to a wall, etc.) that isperpendicular to the shelves (such as a surface of the back of the rack,a surface of the wall, etc.). In yet another example, at least one ofthe one or more vibration sensors may not be physically connected to ashelving unit or a retail shelf.

Image processing of images and videos captured from a retail environmentmay be a burdening task. Processing the images and videos in the retailenvironment may require placing expensive hardware in the retailenvironment. Further, image and video processing may consume significantamount of power, which may be challenging for battery powered systems.On the other hand, transmitting images and videos to a remove system(such as a server or a cloud platform) for processing may be challengingdue to the large size of images and videos. Therefore, it is desired toreduce the number of images and videos processed, and to limit the partsof the images and videos that are transmitted or processed, to theimages and videos, or the parts of the images and videos that includerelevant information.

In some examples, systems, methods and computer-readable media fortriggering image processing based on infrared data analysis areprovided.

FIG. 12 provides a flowchart of an exemplary method 1200 for triggeringimage processing based on infrared data analysis, consistent with thepresent disclosure. In this example, method 1200 may comprise receivingfirst infrared input data captured using a first group of one or moreinfrared sensors (Step 1202); analyzing the first infrared input data todetect an engagement of a person with a retail shelf (Step 1204);receiving second infrared input data captured using a second group ofone or more infrared sensors after the capturing of the first infraredinput data (Step 1206); analyzing the second infrared input data todetermine a completion of the engagement of the person with the retailshelf (Step 1208); in response to the determined completion of theengagement of the person with the retail shelf, analyzing at least oneimage of the retail shelf captured using at least one image sensor afterthe completion of the engagement of the person with the retail shelf(Step 1210); and using the analysis of the at least one image todetermine a state of the retail shelf (Step 1212). In some examples,method 1200 may further comprise providing information based on thestate of the retail shelf determined by Step 1212. For example,providing the information based on the state of the retail shelf maycomprise at least one of storing the information in memory, transmittingthe information to an external device, providing the information to auser (for example, visually, audibly, textually, etc.), and so forth.Additionally or alternatively to Step 1212, method 1200 may furthercomprise providing information based on the analysis of the at least oneimage by Step 1210. For example, providing the information based on theanalysis of the at least one image by Step 1210 may comprise at leastone of storing the information in memory, transmitting the informationto an external device, providing the information to a user (for example,visually, audibly, textually, through a user interface, etc.), and soforth.

In some examples, Step 1202 may comprise receiving first infrared inputdata captured using a first group of one or more infrared sensors. Forexample, receiving the first infrared input data by Step 1202 maycomprise at least one of reading the first infrared input data,receiving the first infrared input data from an external device (forexample, using a digital communication device), capturing the firstinfrared input data using the first group of one or more infraredsensors, and so forth. In some examples, the first group of one or moreinfrared sensors may be a group of at least one of active infraredsensors, passive infrared sensors, thermal infrared sensors,pyroelectric infrared sensors, thermoelectric infrared sensors,photoconductive infrared sensors and photovoltaic infrared sensors. Inone example, the first group of one or more infrared sensors may be agroup of one or more passive infrared sensors. In some examples, thefirst group of one or more infrared sensors may be a group of one ormore infrared sensors positioned below a second retail shelf. In oneexample, the second retail shelf may be positioned above the retailshelf. For example, the first group of one or more infrared sensors maybe a group of one or more infrared sensors mounted to the second retailshelf, mounted to a surface (for example, of a wall, of a rack, etc.)connecting the second retail shelf and the retail shelf, and so forth.

In some examples, Step 1204 may comprise analyzing the first infraredinput data received by Step 1202 to detect an engagement of a personwith a retail shelf. In one example, a machine learning model may betrained using training examples to detect engagements of people withretail shelves from infrared data. An example of such training examplemay include sample infrared data, together with a label indicatingwhether the sample infrared data corresponds to an engagement of aperson with a retail shelf. In one example, Step 1204 may use thetrained machine learning model to analyze the first infrared input datareceived by Step 1202 to detect the engagement of the person with theretail shelf. In another example, Step 1204 may compare the firstinfrared input data or a preprocessed version of the first infraredinput data (such as a function of the first infrared input data) with athreshold, and may use a result of the comparison to detect theengagement of the person with the retail shelf. For example, thethreshold may differentiate between an ambient temperature of anenvironment of the retail shelf and a typical human body temperature. Inan additional example, the threshold may be selected based on astatistical measure of infrared data captured using the first group ofone or more infrared sensors of Step 1202 over time. In some examples,Step 1204 may calculate a convolution of at least part of the firstinfrared input data received by Step 1202. Further, in response to afirst value of the calculated convolution of the at least part of thefirst infrared input data, Step 1204 may detect the engagement of aperson with a retail shelf, and in response to a second value of thecalculated convolution of the at least part of the first infrared inputdata, Step 1204 may forgo detecting the engagement of a person with aretail shelf.

In some examples, Step 1206 may comprise receiving second infrared inputdata captured using a second group of one or more infrared sensors afterthe capturing of the second infrared input data by Step 1202. Forexample, receiving the second infrared input data by Step 1202 maycomprise at least one of reading the second infrared input data,receiving the second infrared input data from an external device (forexample, using a digital communication device), capturing the secondinfrared input data using the second group of one or more infraredsensors, and so forth. In some examples, the second group of one or moreinfrared sensors may be a group of at least one of active infraredsensors, passive infrared sensors, thermal infrared sensors,pyroelectric infrared sensors, thermoelectric infrared sensors,photoconductive infrared sensors and photovoltaic infrared sensors. Inone example, the second group of one or more infrared sensors may be agroup of one or more passive infrared sensors. In one example, the firstgroup of one or more infrared sensors may be identical to the secondgroup of one or more infrared sensors. In another example, the firstgroup of one or more infrared sensors may differ from the second groupof one or more infrared sensors. In yet another example, the first groupof one or more infrared sensors and the second group of one or moreinfrared sensors may include at least one common infrared sensor. In anadditional example, the first group of one or more infrared sensors andthe second group of one or more infrared sensors may include no commoninfrared sensor. In some examples, the second group of one or moreinfrared sensors may be a group of one or more infrared sensorspositioned below a second retail shelf. In one example, the secondretail shelf may be positioned above the retail shelf. For example, thesecond group of one or more infrared sensors may be a group of one ormore infrared sensors mounted to the second retail shelf, mounted to asurface (for example, of a wall, of a rack, etc.) connecting the secondretail shelf and the retail shelf, and so forth.

In some examples, Step 1208 may comprise analyzing the second infraredinput data received by Step 1206 to determine a completion of theengagement of the person with the retail shelf detected by Step 1204. Inone example, a machine learning model may be trained using trainingexamples to determine completions of engagements of people with retailshelves from infrared data. An example of such training example mayinclude sample infrared data, together with a label indicating whetherthe sample infrared data corresponds to a completion of an engagement ofa person with a retail shelf. In one example, Step 1208 may use thetrained machine learning model to analyze the second infrared input datareceived by Step 1206 to determine the completion of the engagement ofthe person with the retail shelf. In another example, Step 1204 maycompare the second infrared input data or a preprocessed version of thesecond infrared input data (such as a function of the second infraredinput data) with a threshold, and may use a result of the comparison todetermine the completion of the engagement of the person with the retailshelf. For example, the threshold may differentiate between an ambienttemperature of an environment of the retail shelf and a typical humanbody temperature. In another example, the threshold may be selectedbased on an analysis of the first infrared input data received by Step1202, for example, based on a value of a statistical measure of thefirst infrared input data. In an additional example, the threshold maybe selected based on a statistical measure of infrared data capturedusing the second group of one or more infrared sensors of Step 1206 overtime. In yet another example, the threshold of Step 1208 may beidentical or different from the threshold of Step 1204. In someexamples, the determination of the completion of the engagement of theperson with the retail shelf by Step 1208 may be a determination thatthe person cleared an environment of the retail shelf. In some examples,Step 1208 may calculate a convolution of at least part of the secondinfrared input data received by Step 1206. Further, in response to afirst value of the calculated convolution of the at least part of thesecond infrared input data, Step 1208 may determine a completion of theengagement of the person with the retail shelf detected by Step 1204,and in response to a second value of the calculated convolution of theat least part of the second infrared input data, Step 1208 may determinethat the engagement of the person with the retail shelf detected by Step1204 is not completed.

In some examples, Step 1210 may comprise, for example in response to thedetermined completion of the engagement of the person with the retailshelf by Step 1208, analyzing at least one image of the retail shelfcaptured using at least one image sensor after the completion of theengagement of the person with the retail shelf. The analysis of the atleast one image of the retail shelf may include any image analysisdescribed herein. For example, Step 1210 may analyze the at least oneimage of the retail shelf using at least one of image processinginstructions 232, Step 724, Step 726 and Step 728. In another example,Step 1210 may analyze the at least one image of the retail shelf usingany of the techniques for analyzing image data described above. In yetanother example, Step 1210 may analyze the at least one image of theretail shelf using at least one of an image classification algorithm, anobject recognition algorithm, a product recognition algorithm, a labelrecognition algorithm, a logo recognition algorithm and a semanticsegmentation algorithm. In some examples, a machine learning model maybe trained using training examples to analyze images. An example of suchtraining example may include a sample image, together with a labelindicating a desired outcome corresponding to the analysis of the sampleimage. In one example, Step 1210 may use the trained machine learningmodel to analyze the at least one image of the retail shelf capturedusing at least one image sensor after the completion of the engagementof the person with the retail shelf to obtain an outcome of theanalysis. In some example, Step 1210 may use an artificial neuralnetwork to analyze the at least one image of the retail shelf capturedusing at least one image sensor after the completion of the engagementof the person with the retail shelf to obtain an outcome of theanalysis, for example as described above. In some examples, Step 1210may base the analysis of the at least one image of the retail shelfcaptured using at least one image sensor after the completion of theengagement of the person with the retail shelf on a calculatedconvolution of at least part of the at least one image. In someexamples, for example in response to the determined completion of theengagement of the person with the retail shelf by Step 1208, Step 1210may further comprise triggering the capturing of the at least one imageof the retail shelf using the at least one image sensor. In someexamples, the at least one image sensor of Step 1210 may be at least oneimage sensor mounted to a second retail shelf. For example, the secondretail shelf may be positioned on an opposite side of an aisle from theretail shelf. In another example, the second retail shelf may bepositioned above the retail shelf. In yet another example, the secondretail shelf may be positioned above the retail shelf and the at leastone image sensor may be positioned below the second retail shelf. Insome examples, the at least one image sensor of Step 1210 may be atleast one image sensor mounted to an image capturing robot. In someexamples, the at least one image sensor of Step 1210 may be at least oneimage sensor mounted to a ceiling of a retail store. In some examples,the at least one image sensor of Step 1210 may be part of a personalmobile device.

In some examples, Step 1212 may comprise using the analysis of the atleast one image to determine a state of the retail shelf. In someexample, Step 1210 may analyze the at least one image to obtain anoutcome of the analysis. In one example, in response to a first outcomeof the analysis of Step 1210, Step 1212 may determine a first state ofthe retail shelf, and in response to a second outcome of the analysis ofStep 1210, Step 1212 may determine a second state of the retail shelf,the second state of the retail shelf may differ from the first state ofthe retail shelf. In some examples, Step 1210 may recognize productsand/or labels associated with the retail shelf, and Step 1212 maydetermine the state of the retail shelf based on the products and/orlabels associated with the retail shelf. In some examples, a machinelearning model may be trained using training examples to determine stateof retail shelves from images. An example of such training example mayinclude a sample image of a sample retail shelf, together with a labelindicating a state of the sample retail shelf. In one example, Steps1210 and 1212 may use the trained machine learning model to analyze theat least one image of the retail shelf captured using at least one imagesensor after the completion of the engagement of the person with theretail shelf to determine the state of the retail shelf. In someexample, Steps 1210 and 1212 may use an artificial neural network toanalyze the at least one image of the retail shelf captured using atleast one image sensor after the completion of the engagement of theperson with the retail shelf to determine the state of the retail shelf.In some example, Steps 1210 and 1212 may use an image classificationmodel to analyze the at least one image of the retail shelf capturedusing at least one image sensor after the completion of the engagementof the person with the retail shelf to determine the state of the retailshelf, for example where each class of the classification modelcorrespond to a different state of the retail shelf. In some example,Steps 1210 and 1212 may use a regression model to analyze the at leastone image of the retail shelf captured using at least one image sensorafter the completion of the engagement of the person with the retailshelf to determine at least one aspect the state of the retail shelf(such as number of product on the retail shelf, score corresponding tothe retail shelf, size of an empty space on the retail shelf, and soforth). In some examples, the state of the retail shelf determined byStep 1212 may include an inventory data associated with products on theretail shelf after the engagement of the person with the retail shelf.In some examples, the state of the retail shelf determined by Step 1212may include facings data associated with products on the retail shelfafter the engagement of the person with the retail shelf. In someexamples, the state of the retail shelf determined by Step 1212 mayinclude planogram compliance status associated with the retail shelfafter the engagement of the person with the retail shelf. In someexamples, the state of the retail shelf determined by Step 1212 mayinclude empty space indication associated with the retail shelf afterthe engagement of the person with the retail shelf.

In some examples, Step 1212 may comprise using the analysis of the atleast one image by Step 1210 and an analysis of one or more images ofthe retail shelf captured using the at least one image sensor before theengagement of the person with the retail shelf to determine a changeassociated with the retail shelf during the engagement of the personwith the retail shelf. Some non-limiting examples of such change mayinclude a product placed on the retail shelf, a product moved from oneposition on the retail shelf to another position on the retail shelf, aproduct removed from the retail shelf, and so forth. For example, Step1212 may compare the state of the retail shelf before the engagement ofthe person with the retail shelf (determined based on the analysis ofthe at least one image by Step 1210) and the state of the retail shelfafter the completion of the engagement of the person with the retailshelf (determined based on the analysis of the one or more images of theretail shelf captured using the at least one image sensor before theengagement of the person with the retail shelf) to determine the changeassociated with the retail shelf during the engagement of the personwith the retail shelf. In another example, Steps 1210 and 1212 maycompare the at least one image of Step 1210 and the one or more imagesof the retail shelf captured using the at least one image sensor beforethe engagement of the person with the retail shelf to determine thechange associated with the retail shelf during the engagement of theperson with the retail shelf.

In some examples, Step 1204 may further comprise analyzing the firstinfrared input data received by Step 1202 to determine a type of theengagement of the person with the retail shelf. For example, aclassification model may be used to analyze the first infrared inputdata received by Step 1202 and classify it to a particular class of aplurality of alternative classes, each class of the plurality ofalternative classes may correspond to a different type of engagement. Inone example, in response to a first determined type of the engagement,Step 1210 may trigger the analyzing the at least one image of the retailshelf, and in response to a second determined type of the engagement,method 1200 may forgo analyzing the at least one image of the retailshelf. In another example, in response to a first determined type of theengagement, Step 1210 may include a first analysis step in the analysisof the at least one image of the retail shelf, and in response to asecond determined type of the engagement (and may exclude a secondanalysis step from the analysis of the at least one image of the retailshelf), Step 1210 may include the second analysis step in the analysisof the at least one image of the retail shelf (and may exclude the firstanalysis step from the analysis of the at least one image of the retailshelf), the second analysis step may differ from the first analysisstep. In one example, the first type of engagement may include aphysical contact (for example, with items placed on the retail shelf,with the retail shelf, with items associated with the retail shelf,etc.), and the second type of engagement may include no physicalcontact. In another example, the first type of engagement may includeengagement associated with a first portion of the retail shelf, and thesecond type of engagement may include engagement associated with asecond portion of the retail shelf. In yet another example, the firsttype of engagement from a first distance, and the second type ofengagement may include engagement from a second distance. In anadditional example, the first type of engagement may include engagementassociated with a first time duration, and the second type of engagementmay include engagement associated with a second time duration.

In some examples, for example in response to the detected engagement ofa person with a retail shelf, method 1200 may analyze one or more imagesof the retail shelf captured before the completion of the engagement ofthe person with the retail shelf to determine at least one aspect of theengagement. For example, the at least one aspect of the engagement mayinclude a change associated with the retail shelf during the engagementof the person with the retail shelf, as described above. In anotherexample, the at least one aspect of the engagement may include at leastone of a product type associated with the engagement (such as a producttype of a product taken from the retail shelf during the engagement, aproduct type of a product placed on the retail shelf during theengagement, a product moved from one location to another on the retailshelf during the engagement, etc.), a quantity of products associatedwith the engagement (such as a quantity of products of products takenfrom the retail shelf during the engagement, a quantity of products ofproducts placed on the retail shelf during the engagement, a quantity ofproducts moved from one location to another on the retail shelf duringthe engagement, etc.), and so forth. In one example, method 1200 mayfurther comprise updating a virtual shopping cart associated with theperson based on the determined at least one aspect of the engagement(for example, based on the determined product type, based on thedetermined quantity of products, and so forth). In one example, Step1212 may further comprise using the analysis of the at least one imagecaptured after the completion of the engagement of the person with theretail shelf and the determined at least one aspect of the engagement todetermine the state of the retail shelf.

In some examples, systems, methods and computer-readable media fortriggering image processing based on vibration data analysis areprovided.

FIG. 13 provides a flowchart of an exemplary method 1300 for triggeringimage processing based on vibration data analysis, consistent with thepresent disclosure. In this example, method 1300 may comprise receivingvibration data captured using one or more vibration sensors mounted to ashelving unit including a plurality of retail shelves (Step 1302);analyzing the vibration data to determine whether a vibration is aresult of an engagement of a person with at least one retail shelf ofthe plurality of retail shelves (Step 1304); in response to adetermination that the vibration is a result of the engagement of theperson with the at least one retail shelf of the plurality of retailshelves, triggering analysis of at least one image of at least part ofthe plurality of retail shelves captured after the beginning of theengagement of the person with the at least one retail shelf of theplurality of retail shelves (Step 1306); in response to a determinationthat the vibration is not a result of the engagement of the person withthe at least one retail shelf of the plurality of retail shelves,forgoing triggering the analysis of the at least one image (Step 1308);and providing information based on a result of the analysis of the atleast one image of the at least part of the plurality of retail shelves(Step 1310).

In some examples, Step 1302 may comprise receiving vibration datacaptured using one or more vibration sensors mounted to a shelving unitincluding a plurality of retail shelves. For example, receiving thevibration data by Step 1302 may comprise at least one of reading thevibration data, receiving the vibration data from an external device(for example, using a digital communication device), capturing thevibration data using the one or more vibration sensors, and so forth.

In some examples, Step 1304 may comprise analyzing the vibration data todetermine whether a vibration is a result of an engagement of a personwith at least one retail shelf of the plurality of retail shelves. Inone example, a machine learning model may be trained using trainingexamples to determine whether vibrations are result of engagement ofpeople with retail shelves. An example of such training example mayinclude sample vibration data, together with a label indicating whetherthe sample vibration data corresponds to engagement of people withretail shelves. In one example, Step 1304 may use the trained machinelearning model to analyze the vibration data received by Step 1302 todetermine whether the vibration is the result of an engagement of aperson with at least one retail shelf of the plurality of retailshelves. In another example, Step 1304 may compare the vibration data ora preprocessed version of the vibration data (such as a function of thevibration data) with a threshold, and may use a result of the comparisonto determine whether the vibration is the result of an engagement of aperson with at least one retail shelf of the plurality of retailshelves. For example, the threshold may differentiate between an ambientvibrations from an environment of the retail shelf and vibrationsoriginating from the retail shelf. In an additional example, thethreshold may be selected based on a statistical measure of historicvibration data captured using the one or more vibration sensors of Step1302 over time. In some examples, Step 1304 may calculate a convolutionof at least part of the vibration data received by Step 1302. Further,in response to a first value of the calculated convolution of the atleast part of the vibration data, Step 1304 may determine that thevibration is the result of an engagement of a person with at least oneretail shelf of the plurality of retail shelves, and in response to asecond value of the calculated convolution of the at least part of thevibration data, Step 1304 may determine that the vibration is not theresult of an engagement of a person with at least one retail shelf ofthe plurality of retail shelves.

In some examples, Step 1306 may comprise, for example in response to adetermination by Step 1304 that the vibration is the result of theengagement of the person with the at least one retail shelf of theplurality of retail shelves, triggering analysis of at least one imageof at least part of the plurality of retail shelves captured after thebeginning of the engagement of the person with the at least one retailshelf of the plurality of retail shelves. In some examples, Step 1308may comprise, for example in response to a determination by Step 1304that the vibration is not the result of the engagement of the personwith the at least one retail shelf of the plurality of retail shelves,forgoing triggering the analysis of the at least one image In someexamples, the triggering of the analysis of the at least one image maycomprise transmitting a signal (for example to an external device)configured to cause the analysis of the at least one image (for exampleby the external device), performing the analysis of the at least oneimage, storing a selected value at a selected location in a memoryconfigured to cause another process to perform the analysis of the atleast one image, and so forth. The analysis of the at least one image ofat least part of the plurality of retail shelves captured after thebeginning of the engagement of the person with the at least one retailshelf of the plurality of retail shelves may include any image analysisdescribed herein. For example, Step 1306 may analyze the at least oneimage of the at least part of the plurality of retail shelves using atleast one of image processing instructions 232, Step 724, Step 726 andStep 728. In another example, Step 1306 may analyze the at least oneimage of the at least part of the plurality of retail shelves using anyof the techniques for analyzing image data described above. In yetanother example, Step 1306 may analyze the at least one image of the atleast part of the plurality of retail shelves using at least one of animage classification algorithm, an object recognition algorithm, aproduct recognition algorithm, a label recognition algorithm, a logorecognition algorithm and a semantic segmentation algorithm. In someexamples, a machine learning model may be trained using trainingexamples to analyze images. An example of such training example mayinclude a sample image, together with a label indicating a desiredoutcome corresponding to the analysis of the sample image. In oneexample, Step 1306 may use the trained machine learning model to analyzethe at least one image of at least part of the plurality of retailshelves captured after the beginning of the engagement of the personwith the at least one retail shelf of the plurality of retail shelves toobtain an outcome of the analysis. In some example, Step 1306 may use anartificial neural network to analyze the at least one image of at leastpart of the plurality of retail shelves captured after the beginning ofthe engagement of the person with the at least one retail shelf of theplurality of retail shelves to obtain an outcome of the analysis, forexample as described above. In some examples, Step 1306 may base theanalysis of the at least one image of at least part of the plurality ofretail shelves captured after the beginning of the engagement of theperson with the at least one retail shelf of the plurality of retailshelves on a calculated convolution of at least part of the at least oneimage. Additionally or alternatively to triggering analysis of at leastone image, Step 1306 may comprise, for example in response to thedetermination by Step 1304 that the vibration is the result of theengagement of the person with the at least one retail shelf, triggeringcapturing of the at least one image of the at least part of theplurality of retail shelves, and in some examples, Step 1308 maycomprise, for example in response to the determination by Step 1304 thatthe vibration is not the result of the engagement of the person with theat least one retail shelf, forgoing triggering the capturing of the atleast one image.

In some examples, Step 1310 may comprise providing information based ona result of the analysis triggered by Step 1306 of the at least oneimage of the at least part of the plurality of retail shelves. Forexample, providing the information based on the based on the result ofthe analysis triggered by Step 1306 of the at least one image of the atleast part of the plurality of retail shelves may comprise at least oneof storing the information in memory, transmitting the information to anexternal device, providing the information to a user (for example,visually, audibly, textually, through a user interface, etc.), and soforth.

In some examples, the plurality of retail shelves of method 1300 mayinclude at least a first retail shelf and a second retail shelf.Additionally or alternatively to Step 1304, method 1304 may compriseanalyzing the vibration data to determine that the vibration is a resultof an engagement with the first retail shelf of the plurality of retailshelves and not a result of an engagement with the second retail shelfof the plurality of retail shelves. In one example, a machine learningmodel may be trained using training examples to determine particularretail shelves corresponding to engagement of people from vibrationdata. An example of such training example may include sample vibrationdata, together with a label indicating a particular retail shelfcorresponding to engagement corresponding to the sample vibration dataof a plurality of alternative retail shelves. In one example, method1300 may use the trained machine learning model to analyze the vibrationdata received by Step 1302 to determine that the vibration is a resultof an engagement with the first retail shelf of the plurality of retailshelves and not a result of an engagement with the second retail shelfof the plurality of retail shelves. In another example, method 1300 maycompare the vibration data or a preprocessed version of the vibrationdata (such as a function of the vibration data) with a threshold, andmay use a result of the comparison to determine that the vibration is aresult of an engagement with the first retail shelf of the plurality ofretail shelves and not a result of an engagement with the second retailshelf of the plurality of retail shelves. In some examples, method 1300may calculate a convolution of at least part of the vibration datareceived by Step 1302. Further, in response to a first value of thecalculated convolution of the at least part of the vibration data,method 1300 may determine that the vibration is a result of anengagement with the first retail shelf of the plurality of retailshelves and not a result of an engagement with the second retail shelfof the plurality of retail shelves, and in response to a second value ofthe calculated convolution of the at least part of the vibration data,method 1300 may determine that the vibration is not a result of anengagement with the first retail shelf of the plurality of retailshelves and/or that the vibration is a result of an engagement with thesecond retail shelf of the plurality of retail shelves. Further, in someexamples, for example in response to the determination that thevibration is a result of an engagement with the first retail shelf ofthe plurality of retail shelves and not a result of an engagement withthe second retail shelf of the plurality of retail shelves, method 1300may avoid including images depicting the second shelf in the at leastone image of Steps 1306, 1308 and 1310.

In some examples, the at least one image of method 1300 may be capturedusing at least one image sensor mounted to a retail shelf not includedthe at least one retail shelf. In one example, the retail shelf notincluded the at least one retail shelf may be on an opposite side of anaisle from the at least one retail shelf, for example as illustrated inFIG. 4A and FIG. 6A. In another example, the retail shelf not includedthe at least one retail shelf may be positioned above the at least oneretail shelf. In some examples, the retail shelf not included the atleast one retail shelf may be positioned above the at least one retailshelf and the at least one image sensor may be positioned below thesecond retail shelf. In some examples, the at least one image of method1300 may be captured using at least one image sensor mounted to an imagecapturing robot (for example, a wheeled robot such as capturing device125G, a legged robot, a snake-like robot, and so forth). In someexamples, the at least one image of method 1300 may be captured using atleast one image sensor mounted to a ceiling of a retail store. In someexamples, the at least one image of method 1300 may be captured using atleast one image sensor included in a personal mobile device, such ascapturing device 125D.

Additionally or alternatively to determining whether the vibration isthe result of an engagement of a person with the at least one retailshelf, Step 1304 may analyze the vibration data received by Step 1302 todetermine a type of the engagement of the person with the at least oneretail shelf. For example, a classification model may be used to analyzethe vibration data received by Step 1302 and classify it to a particularclass of a plurality of alternative classes, each class of the pluralityof alternative classes may correspond to a different type of engagement.In one example, in response to a first determined type of theengagement, Step 1306 may trigger the analysis of the at least one imageof the at least part of the plurality of retail shelves, and in responseto a second determined type of the engagement, Step 1308 may forgotriggering the analysis of the at least one image of the at least partof the plurality of retail shelves. In another example, in response to afirst determined type of the engagement, Step 1306 may include a firstanalysis step in the analysis of the at least one image of the at leastpart of the plurality of retail shelves (and may exclude a secondanalysis step from the analysis of the at least one image of the atleast part of the plurality of retail shelves), and in response to asecond determined type of the engagement, Step 1306 may include thesecond analysis step in the analysis of the at least one image of the atleast part of the plurality of retail shelves (and may exclude the firstanalysis step from the analysis of the at least one image of the atleast part of the plurality of retail shelves), the second analysis stepmay differ from the first analysis step. In one example, the first typeof engagement may include a physical contact (for example, with itemsplaced on the retail shelf, with the retail shelf, with items associatedwith the retail shelf, etc.), and the second type of engagement mayinclude no physical contact. In another example, the first type ofengagement may include engagement associated with a first portion of theat least one retail shelf, and the second type of engagement may includeengagement associated with a second portion of the at least one retailshelf. In yet another example, the first type of engagement may includeengagement associated with a first type of action, and the second typeof engagement may include engagement associated with a second type ofaction. Some non-limiting examples of such types of actions may includeremoval of at least one item (such as a product) from the at least oneretail shelf, placement of at least one item (such as a product) on theat least one retail shelf, repositioning of at least one item (such as aproduct) on the at least one retail shelf, and so forth.

In some examples, the at least one image of method 1300 may be at leastone image of the at least part of the plurality of retail shelvescaptured after a completion of the engagement of the person with the atleast one retail shelf. In one example, Step 1304 may comprise analyzingthe vibration data to determine the completion of the engagement of theperson with the at least one retail shelf from vibration data. In oneexample, a machine learning model may be trained using training examplesto determine completion of engagement of people with retail shelves. Anexample of such training example may include sample vibration data,together with a label indicating whether the sample vibration datacorresponds to completion of engagement of a person with a retail shelf.In one example, Step 1304 may use the trained machine learning model toanalyze the vibration data received by Step 1302 to determine thecompletion of the engagement of the person with the at least one retailshelf. In another example, Step 1304 may compare the vibration data or apreprocessed version of the vibration data (such as a function of thevibration data) with a threshold, and may use a result of the comparisonto determine the completion of the engagement of the person with the atleast one retail shelf. For example, the threshold may differentiatebetween an ambient vibrations from an environment of the retail shelfand vibrations resulting from such engagement. In an additional example,the threshold may be selected based on a statistical measure of historicvibration data captured using the one or more vibration sensors of Step1302 over time. In some examples, Step 1304 may calculate a convolutionof at least part of the vibration data received by Step 1302. Further,in response to a first value of the calculated convolution of the atleast part of the vibration data, Step 1304 may determine the completionof the engagement of the person with the at least one retail shelf, andin response to a second value of the calculated convolution of the atleast part of the vibration data, Step 1304 may forgo the determinationof the completion of the engagement of the person with the at least oneretail shelf.

In some examples, the at least one image of method 1300 may be at leastone image of the at least part of the plurality of retail shelvescaptured after a completion of the engagement of the person with the atleast one retail shelf. In some examples, method 1300 may compriseanalyzing one or more images of the at least one retail shelf todetermine the completion of the engagement of the person with the atleast one retail shelf. In one example, a machine learning model may betrained using training examples to determine completion of engagement ofpeople with retail shelves from images. An example of such trainingexample may include sample image, together with a label indicatingwhether the sample image corresponds to completion of engagement of aperson with a retail shelf. In one example, method 1300 may use thetrained machine learning model to analyze the one or more images of theat least one retail shelf to determine the completion of the engagementof the person with the at least one retail shelf. In one example, method1300 may calculate a convolution of at least part of the one or moreimages of the at least one retail shelf. Further, in response to a firstvalue of the calculated convolution of the at least part of thevibration data, method 1300 may determine the completion of theengagement of the person with the at least one retail shelf, and inresponse to a second value of the calculated convolution of the at leastpart of the vibration data, method 1300 may forgo the determination ofthe completion of the engagement of the person with the at least oneretail shelf. In some examples, method 1300 may analyzing infrared datacaptured using at least one infrared sensor to determine a completion ofthe engagement of the person with the at least one retail shelf, forexample as described above.

In some examples, the at least one image of method 1300 may be at leastone image of the at least part of the plurality of retail shelvescaptured after a completion of the engagement of the person with the atleast one retail shelf. Further, in some examples, method 1300 may usethe analysis of Step 1306 of the at least one image of the at least partof the plurality of retail shelves to determine a state of at least oneretail shelf after the completion of the engagement, for example asdescribed above in relation to Step 1210. In one example, the determinedstate of the at least one retail shelf may include an inventory dataassociated with products on the at least one retail shelf after thecompletion of the engagement, and the inventory data may be determinedusing the analysis of the at least one image by Step 1306, for exampleas described above in relation to Step 1212. In another example, thedetermined state of the at least one retail shelf may include facingsdata associated with products on the at least one retail shelf after thecompletion of the engagement, and the facings data may be determinedusing the analysis of the at least one image by Step 1306, for exampleas described above in relation to Step 1212. In yet another example, thedetermined state of the at least one retail shelf may include planogramcompliance status of the at least one retail shelf after the completionof the engagement, and the planogram compliance status may be determinedusing the analysis of the at least one image by Step 1306, for exampleas described above in relation to Step 1212.

In some examples, the at least one image of method 1300 may be at leastone image of the at least part of the plurality of retail shelvescaptured after a completion of the engagement of the person with the atleast one retail shelf. Further, in some examples, method 1300 may usethe analysis of the at least one image by Step 1306 and an analysis ofone or more images of the at least one retail shelf captured using theat least one image sensor before the engagement to determine a changeassociated with the at least one retail shelf during the engagement, forexample as described above in relation to Steps 1210 and 1212. Somenon-limiting examples of such change may include a product placed on theretail shelf, a product moved from one position on the retail shelf toanother position on the retail shelf, a product removed from the retailshelf, and so forth.

In some examples, systems, methods and computer-readable media forforgoing image processing in response to infrared data analysis areprovided.

FIG. 14 provides a flowchart of an exemplary method 1400 for forgoingimage processing in response to infrared data analysis, consistent withthe present disclosure. In this example, method 1400 may comprisereceiving infrared input data captured using one or more infraredsensors (Step 1402); analyzing the infrared input data to detect apresence of an object in an environment of a retail shelf (Step 1404);in response to no detected presence of an object in the environment ofthe retail unit, analyzing at least one image of the retail shelfcaptured using at least one image sensor (Step 1406); and in response toa detection of presence of an object in the environment of the retailunit, forgoing analyzing the at least one image of the retail shelfcaptured using the at least one image sensor (Step 1408). In oneexample, the environment of the retail shelf may be, include, or beincluded in an area between the at least one image sensor and at leastpart of the retail shelf, for example an area that a presence of anopaque object in it will cause an occlusion of at least part of theretail shelf in at least one image.

In some examples, Step 1402 may comprise receiving infrared input datacaptured using one or more infrared sensors. For example, receiving theinfrared input data by Step 1402 may comprise at least one of readingthe infrared input data, receiving the infrared input data from anexternal device (for example, using a digital communication device),capturing the infrared input data using the one or more infraredsensors, and so forth. In some examples, the one or more infraredsensors may be at least one of active infrared sensors, passive infraredsensors, thermal infrared sensors, pyroelectric infrared sensors,thermoelectric infrared sensors, photoconductive infrared sensors andphotovoltaic infrared sensors. In one example, the one or more infraredsensors may be one or more passive infrared sensors. In some examples,the one or more infrared sensors may be one or more infrared sensorspositioned below a second retail shelf. In one example, the secondretail shelf may be positioned above the retail shelf. For example, theone or more infrared sensors may be one or more infrared sensors mountedto the second retail shelf, mounted to a surface (for example, of awall, of a rack, etc.) connecting the second retail shelf and the retailshelf, and so forth. In some examples, the one or more infrared sensorsmay be one or more infrared sensors mounted to a second retail shelf. Inone example, the second retail shelf may be positioned on an oppositeside of an aisle from the retail shelf.

In some examples, Step 1404 may comprise analyzing the infrared inputdata received by Step 1402 to detect a presence of an object in anenvironment of a retail shelf. In one example, a machine learning modelmay be trained using training examples to detect presence of objects inenvironments from infrared data. An example of such training example mayinclude sample infrared data, together with a label indicating whetherthe sample infrared data corresponds to a presence of an object in anenvironment. In one example, Step 1404 may use the trained machinelearning model to analyze the infrared input data received by Step 1402to detect the presence of the object in the environment of the retailshelf. In another example, Step 1404 may compare the infrared input dataor a preprocessed version of the infrared input data (such as a functionof the infrared input data) with a threshold, and may use a result ofthe comparison to detect the presence of the object in the environmentof the retail shelf. For example, the threshold may differentiatebetween an ambient temperature of an environment of the retail shelf anda typical human body temperature, or between typical temperatures of arefrigeration unit including the retail shelf to an ambient temperature.In an additional example, the threshold may be selected based on astatistical measure of infrared data captured using the one or moreinfrared sensors of Step 1402 over time. In some examples, Step 1404 maycalculate a convolution of at least part of the infrared input datareceived by Step 1402. Further, in response to a first value of thecalculated convolution of the at least part of the infrared input data,Step 1404 may detect the presence of the object in the environment ofthe retail shelf, and in response to a second value of the calculatedconvolution of the at least part of the infrared input data, Step 1404may avoid detecting the presence of the object in the environment of theretail shelf. In some examples, the one or more infrared sensors may beone or more infrared sensors physically coupled with the at least oneimage sensor (such as capturing devices 125A, 125B, and 125C asillustrated in FIG. 4A). For example, a common housing may include boththe one or more infrared sensors and the at least one image sensor. Inanother example, the one or more infrared sensors may be physicallyconnected to the at least one image sensor, for example with at leastone wire, with a power cable, with a data cable, with a bracket, and soforth. In yet another example, the one or more infrared sensors and theat least one image sensor may be physically connected to a thirdhousing, such as housing 504J or housing 5041. For example, the thirdhousing may include a processing unit, may include memory, may include awireless communication device, may include a power source, and so forth.In some examples, the object of Step 1404 may include at least one of aperson, a robot, and an inanimate object. Other non-limiting examples ofthe object of Step 1404 may include a shopping cart, a ladder and apallet jack.

In some examples, Step 1406 may comprise, for example in response to nodetected presence of an object in the environment of the retail unit byStep 1404, analyzing at least one image of the retail shelf capturedusing at least one image sensor. In some examples, Step 1408 maycomprise, for example in response to a detection of presence of anobject in the environment of the retail unit by Step 1404, forgoinganalyzing the at least one image of the retail shelf captured using theat least one image sensor. In some examples, analyzing at least oneimage of the retail shelf captured using at least one image sensor byStep 1406 may include any image analysis described herein. For example,Step 1406 may analyze the at least one image using at least one of imageprocessing instructions 232, Step 724, Step 726 and Step 728. In anotherexample, Step 1406 may analyze the at least one image using any of thetechniques for analyzing image data described above. In yet anotherexample, Step 1406 may analyze the at least one image using at least oneof an image classification algorithm, an object recognition algorithm, aproduct recognition algorithm, a label recognition algorithm, a logorecognition algorithm and a semantic segmentation algorithm. In someexamples, a machine learning model may be trained using trainingexamples to analyze images. An example of such training example mayinclude a sample image, together with a label indicating a desiredoutcome corresponding to the analysis of the sample image. In oneexample, Step 1406 may use the trained machine learning model to analyzethe at least one image to obtain an outcome of the analysis. In someexample, Step 1406 may use an artificial neural network to analyze theat least one image to obtain an outcome of the analysis, for example asdescribed above. In some examples, Step 1406 may base the analysis ofthe at least one image on a calculated convolution of at least part ofthe at least one image. Additionally or alternatively to triggeringanalysis of at least one image, Step 1406 may comprise, for example inresponse to no detected presence of an object in the environment of theretail unit by Step 1404, triggering capturing of the at least oneimage, and in some examples, Step 1408 may comprise, for example inresponse to a detection of presence of an object in the environment ofthe retail unit by Step 1404, forgoing triggering the capturing of theat least one image.

In some examples, the at least one image sensor of Step 1406 and Step1408 may be at least one image sensor mounted to a second retail shelf.In one example, the second retail shelf may be on an opposite side of anaisle from the retail shelf, for example as illustrated in FIG. 4A andFIG. 6A. In another example, the second retail shelf may be positionedabove the retail shelf. In some examples, the second retail shelf may bepositioned above the retail shelf and the at least one image sensor maybe positioned below the second retail shelf. In some examples, the atleast one image sensor of Step 1406 and Step 1408 may be at least oneimage sensor mounted to an image capturing robot (for example, a wheeledrobot such as capturing device 125G, a legged robot, a snake-like robot,and so forth). In some examples, the at least one image sensor of Step1406 and Step 1408 may be at least one image sensor mounted to a ceilingof a retail store. In some examples, the at least one image sensor ofStep 1406 and Step 1408 may be part of a personal mobile device, such ascapturing device 125D.

In some examples, method 1400 may further comprise using the analysis ofthe at least one image by Step 1406 to determine a state of the retailshelf, for example as described above in relation to Step 1210. In oneexample, the determined state of the retail shelf may include aninventory data associated with products on the retail shelf, and theinventory data may be determined using the analysis of the at least oneimage by Step 1406, for example as described above in relation to Step1212. In another example, the determined state of the retail shelf mayinclude facings data associated with products on the retail shelf, andthe facings data may be determined using the analysis of the at leastone image by Step 1406, for example as described above in relation toStep 1212. In yet another example, the determined state of the retailshelf may include planogram compliance status of the retail shelf, andthe planogram compliance status may be determined using the analysis ofthe at least one image by Step 1406, for example as described above inrelation to Step 1212.

In some examples, Step 1404 may analyze the infrared input data todetermine a portion of a field of view of the at least one image sensorassociated with the object, for example using a regression model, usinga semantic segmentation model, using a background subtraction model, andso forth. Further, in some examples, in response to a first portion ofthe field of view of the at least one image sensor associated with theobject determined by Step 1404, Step 1406 may analyze the at least oneimage of the retail shelf captured using the at least one image sensor,and in response to a second portion of the field of view of the at leastone image sensor associated with the object determined by Step 1404,Step 1408 may forgo analyzing the at least one image of the retail shelfcaptured using the at least one image sensor. In one example, the fieldof view of the at least one image sensor may differ from the field ofview of the one or more infrared sensors. In another example, the fieldof view of the at least one image sensor and the field of view of theone or more infrared sensors may be identical or substantiallyidentical. In some examples, Step 1404 may analyze the infrared inputdata to determine a type of the object, for example using an objectrecognition algorithm, using a classification model, and so forth.Further, in some examples, in response to a first type of the objectdetermined by Step 1404, Step 1406 may analyze the at least one image ofthe retail shelf captured using the at least one image sensor, and inresponse to a second type of the object determined by Step 1404, Step1408 may forgo analyzing the at least one image of the retail shelfcaptured using the at least one image sensor.

In some examples, Step 1404 may analyze the infrared input data todetermine a duration associated with the presence of an object in theenvironment of the retail shelf, for example using a regression model,using a Markov model, using a Viterbi algorithm, and so forth. In someexamples, method 1400 may further comprise comparing the durationdetermined by Step 1404 with a threshold. Further, in response to afirst result of the comparison, Step 1406 may analyze the at least oneimage of the retail shelf captured using the at least one image sensor,and in response to a second result of the comparison, Step 1408 mayforgo analyzing the at least one image of the retail shelf capturedusing the at least one image sensor. In one example, the threshold maybe selected based on at least one product type associated with theretail shelf. For example, in response to a first product typeassociated with the retail shelf, a first threshold may be selected, andin response to a second product type associated with the retail shelf, asecond threshold may be selected, the second threshold may differ fromthe first threshold. In one example, the threshold may be selected basedon a status of the retail shelf determined using image analysis (forexample using Steps 1210 and 1212 or using method 1200) of one or moreimages of the retail shelf captured using the at least one image sensorbefore the capturing of the infrared input data by Step 1402. Forexample, in response to a first status of the retail shelf, a firstthreshold may be selected, and in response to a second status of theretail shelf, a second threshold may be selected, the second thresholdmay differ from the first threshold. In one example, the threshold maybe selected based on a time of day. For example, in response to a firsttime of day, a first threshold may be selected, and in response to asecond time of day, a second threshold may be selected, the secondthreshold may differ from the first threshold.

In some examples, method 1400 may further comprise, in response to nopresence of an object in the environment of the retail unit detected byStep 1404, capturing the at least one image of the retail shelf usingthe at least one image sensor, and in response to a detection ofpresence of an object in the environment of the retail unit by Step1404, forgoing the capturing of the at least one image of the retailshelf.

Using only one type modality (such as image data, infrared data,vibration data, etc.) to detect and/or recognize actions may result inunsatisfactory results, such as low accuracy, low precision, lowsensitivity, results with low confidence levels, failure to successfullydetermine aspects of the actions (such as a type of an action, a producttype associated with an action, a quantity associated with an action,etc.), and so forth. For example, using only image data to detect and/orrecognize actions may fail due to image blur, occlusions, insufficientpixel resolution, insufficient frame rate, ambiguity in the visual data,and so forth. In another example, using only infrared data to detectand/or recognize actions may fail due to ambient noise, ambiguity in theinfrared data, and so forth. In yet another example, using onlyvibration data to detect and/or recognize actions may fail due toambient noise, ambiguity in the vibration data, and so forth. Analyzingdata from multiple modalities together to detect and/or recognizeactions may improve the results. For example, combining data frommultiple modalities may overcome many of the problems faced when usingonly one modality, and may therefore provide improve accuracy, improveprecision, improve sensitivity, provide results with higher confidencelevels, enable determination of additional aspects of the actions (suchas a type of an action, a product type associated with an action, aquantity associated with an action, etc.), and so forth.

In some examples, systems, methods and computer-readable media for usinginfrared data analysis and image analysis for robust action recognitionin retail environment are provided.

FIG. 15 provides a flowchart of an exemplary method 1500 for usinginfrared data analysis and image analysis for robust action recognitionin retail environment, consistent with the present disclosure. In thisexample, method 1500 may comprise: receiving infrared data capturedusing one or more infrared sensors from a retail environment (Step1502); receiving at least one image captured using at least one imagesensor from the retail environment (Step 1504); analyzing the infrareddata and the at least one image to detect an action performed in theretail environment (Step 1506); and providing information based on thedetected action (Step 1508).

In some examples, Step 1502 may comprise receiving infrared datacaptured using one or more infrared sensors from a retail environment.For example, receiving the infrared data by Step 1502 may comprise atleast one of reading the infrared data, receiving the infrared data froman external device (for example, using a digital communication device),capturing the infrared data using the one or more infrared sensors fromthe retail environment, and so forth. In some examples, the one or moreinfrared sensors may be at least one of active infrared sensors, passiveinfrared sensors, thermal infrared sensors, pyroelectric infraredsensors, thermoelectric infrared sensors, photoconductive infraredsensors and photovoltaic infrared sensors. In one example, the one ormore infrared sensors may be one or more passive infrared sensors. Insome examples, the one or more infrared sensors may be one or moreinfrared sensors positioned below a second retail shelf. In one example,the second retail shelf may be positioned above the retail shelf. Forexample, the one or more infrared sensors may be one or more infraredsensors mounted to the second retail shelf, mounted to a surface (forexample, of a wall, of a rack, etc.) connecting the second retail shelfand the retail shelf, and so forth. In some examples, the one or moreinfrared sensors may be one or more infrared sensors mounted to a secondretail shelf. In one example, the second retail shelf may be positionedon an opposite side of an aisle from the retail shelf.

In some examples, Step 1504 may comprise receiving at least one imagecaptured using at least one image sensor from a retail environment (forexample, from the retail environment of Step 1502), for example asdescribed above. In some examples, receiving at least one image by Step1504 may comprise at least one of reading the at least one image,receiving the at least one image from an external device (for example,using a digital communication device), capturing the at least one imageusing the at least one image sensor from the retail environment, and soforth. In some examples, the at least one image sensor of Step 504 maybe at least one image sensor mounted to a retail shelf, for example asillustrated in FIG. 4A, FIG. 6A and FIG. 6B. In some examples, the atleast one image sensor of Step 1504 may be at least one image sensormounted to an image capturing robot (for example, a wheeled robot suchas capturing device 125G, a legged robot, a snake-like robot, and soforth). In some examples, the at least one image sensor of Step 1504 maybe at least one image sensor mounted to a ceiling of a retail store. Insome examples, the at least one image sensor of Step 1504 may be part ofa personal mobile device, such as capturing device 125D. In someexamples, the at least one image received by Step 1504 may include atleast one three-dimensional image (such as a range image, a stereoimage, a depth image, a three-dimensional array of voxels, and soforth).

In some examples, Step 1506 may comprise analyzing the infrared datareceived by Step 1502 and the at least one image received by Step 1504to detect an action performed in the retail environment. In someexamples, the action may include at least one of picking a product froma retail shelf, placing a product on a retail shelf and moving a producton a retail shelf. Some other non-limiting examples of such action mayinclude placing a label (such as a shelf label), remoting a label (suchas a shelf label), placing a promotional sign, removing a promotionsign, changing a price, cleaning, restocking, rearranging products, andso forth. In some examples, a machine learning model may be trainedusing training examples to detect actions from infrared data and images.An example of such training example may include a sample infrared dataand a sample image, together with a label indicating whether the sampleinfrared data and the sample image corresponds to an action performed inan environment. In one example, Step 1506 may use the trained machinelearning model to analyze the infrared data received by Step 1502 andthe at least one image received by Step 1504 to detect the actionperformed in the retail environment. In some example, Step 1506 may usean artificial neural network to analyze the infrared data received byStep 1502 and the at least one image received by Step 1504 to detect theaction performed in the retail environment.

In some examples, Step 1506 may calculate a convolution of at least partof the at least one image received by Step 1504 to obtain a value of thecalculated convolution, and may use the value of the calculatedconvolution to analyze the infrared data received by Step 1502 to detectthe action performed in the retail environment. For example, Step 1506may analyze the infrared data received by Step 1502 using a parametricmodel to detect the action performed in the retail environment, and theparameter may be selected based on the value of the calculatedconvolution. In another example, in response to a first value of thecalculated convolution, Step 1506 may analyze the infrared data receivedby Step 1502 using a first analysis step to detect the action performedin the retail environment, and in response to a second value of thecalculated convolution, Step 1506 may analyze the infrared data receivedby Step 1502 using a second analysis step to detect the action performedin the retail environment, the second analysis step may differ from thefirst analysis step.

In some examples, Step 1506 may calculate a convolution of at least partof the infrared data received by Step 1502 to obtain a value of thecalculated convolution, and may use the value of the calculatedconvolution to analyze the at least one image received by Step 1504 todetect the action performed in the retail environment. For example, Step1506 may analyze at least one image received by Step 1504 using aparametric model to detect the action performed in the retailenvironment, and the parameter may be selected based on the value of thecalculated convolution. In another example, in response to a first valueof the calculated convolution, Step 1506 may analyze the at least oneimage received by Step 1504 using a first analysis step to detect theaction performed in the retail environment, and in response to a secondvalue of the calculated convolution, Step 1506 may analyze the at leastone image received by Step 1504 using a second analysis step to detectthe action performed in the retail environment, the second analysis stepmay differ from the first analysis step.

In some examples, the infrared data received by Step 1502 may include atime series of samples captured using the one or more infrared sensorsat different points in time. In some examples, Step 1506 may compare twosamples of the time series of samples, and may use a result of thecomparison to analyze the at least one image received by Step 1504 todetect the action performed in the retail environment. For example, Step1506 may analyze at least one image received by Step 1504 using aparametric model to detect the action performed in the retailenvironment, and the parameter may be selected based on the result ofthe comparison. In another example, in response to a first result of thecomparison, Step 1506 may analyze the at least one image received byStep 1504 using a first analysis step to detect the action performed inthe retail environment, and in response to a second result of thecomparison, Step 1506 may analyze the at least one image received byStep 1504 using a second analysis step to detect the action performed inthe retail environment, the second analysis step may differ from thefirst analysis step.

In some examples, the at least one image received by Step 1504 mayinclude a plurality of frames of a video captured using the at least oneimage sensor. In some examples, Step 1506 may compare two frames of theplurality of frames, and may use a result of the comparison to analyzethe infrared data received by Step 1502 to detect the action performedin the retail environment. For example, Step 1506 may analyze theinfrared data received by Step 1502 using a parametric model to detectthe action performed in the retail environment, and the parameter may beselected based on the result of the comparison. In another example, inresponse to a first result of the comparison, Step 1506 may analyze theinfrared data received by Step 1502 using a first analysis step todetect the action performed in the retail environment, and in responseto a second result of the comparison, Step 1506 may analyze the infrareddata received by Step 1502 using a second analysis step to detect theaction performed in the retail environment, the second analysis step maydiffer from the first analysis step.

In some examples, Step 1506 may analyzing the infrared data received byStep 1502 to select a portion of the at least one image received by Step1504. For example, in response to a first infrared data received by Step1502, Step 1504 may select a first portion of the at least one imagereceived by Step 1504, and in response to a second infrared datareceived by Step 1502, Step 1504 may select a second portion of the atleast one image received by Step 1504, the second portion may differfrom the first portion. In another example, the infrared data receivedby Step 1502 may include spatial properties, and Step 1506 may selectthe portion of the at least one image received by Step 1504 based on thespatial properties. For example, the spatial properties may include anindication of a region in the retail environment, and Step 1506 mayselect a portion of the at least one image received by Step 1504corresponding to the indicated region of the retail environment.Further, in some examples, Step 1506 may analyzing the selected portionof the at least one image to detect the action performed in the retailenvironment, for example using the image analysis described above.

In some examples, Step 1506 may comprise analyzing the infrared datareceived by Step 1502 to attempt to detect the action performed in theretail environment, for example using a pattern recognition algorithm.In some examples, for example in response to a failure of the attempt tosuccessfully detect the action, Step 1506 may analyze the at least oneimage received by Step 1504 to detect the action performed in the retailenvironment, for example using a visual action recognition algorithm. Inone example, for example in response to a failure to successfully detectthe action, method 1500 may trigger the capturing of the at least oneimage using the at least one image sensor. In one example, the failureto successfully detect the action may be a failure to successfullydetect the action at a confidence level higher than a selectedthreshold. In another example, the failure to successfully detect theaction may be a failure to determine at least one aspect of the action.Some non-limiting examples of such aspect may include at least one of atype of the action, a product type associated with the action, and aquantity of products associated with the action.

In some examples, Step 1508 may comprise providing information based onthe action detected by Step 1506. For example, providing the informationbased on the action detected by Step 1506 may comprise at least one ofstoring the information in memory, transmitting the information to anexternal device, providing the information to a user (for example,visually, audibly, textually, through a user interface, etc.), and soforth.

In some examples, detecting the action performed in the retailenvironment by Step 1506 may further include recognizing a type of theaction. For example, Step 1506 may use a classification model toclassify the action to a particular class of a plurality of alternativeclasses, each class of the plurality of alternative classes maycorrespond to a different type of action. In another example, Step 1506may analyze the infrared data received by Step 1502 and the at least oneimage received by Step 1504 (for example using the classification mode,using a machine learning model trained using training examples torecognize types of actions from records including both infrared data andimages, using an artificial neural network, and so forth) to recognizethe type of the action. Some non-limiting examples of such types ofactions may include picking an item, picking a product, placing an item,placing a product, moving an item, moving a product, placing a label(such as a shelf label), remoting a label (such as a shelf label),placing a promotional sign, removing a promotion sign, changing a price,cleaning, restocking, rearranging products, and so forth. Further, insome examples, the information provided by Step 1508 may be based on thetype of the action. In one example, the information provided by Step1508 may include an indication of the type of the action. In oneexample, in response to a first type of the action, Step 1508 mayprovide first information, and in response to a second type of theaction, Step 1508 may provide second information, the second informationmay differ from the first information. In one example, in response to afirst type of the action, Step 1508 may provide the information, and inresponse to a second type of the action, Step 1508 may forgo providingthe information.

In some examples, detecting the action performed in the retailenvironment by Step 1506 may further include identifying a product typeassociated with the action. For example, Step 1506 may use aclassification model to classify the action to a particular class of aplurality of alternative classes, each class of the plurality ofalternative classes may correspond to a different product type. Inanother example, Step 1506 may analyze the infrared data received byStep 1502 and the at least one image received by Step 1504 (for exampleusing the classification mode, using a machine learning model trainedusing training examples to identify product types of products associatedwith actions from records including both infrared data and images, usingan artificial neural network, and so forth) to identify the producttype. In one example, the action may include at least one of picking,placing and moving a product, and the product type associated with theaction may be a product type of the product. In one example, the actionmay include at least one of placing and remoting a label (such as ashelf label), and the product type associated with the action may be aproduct type indicated by the label (for example, by text printed on thelabel, by a logo on the label, by a picture on the label, by a visualcode on the label, and so forth). In one example, the action may includeat least one of placing and removing a promotion sign, and the producttype associated with the action may be a product type associated withthe promotion sign. In one example, the action may include changing aprice of products of a particular product type, and the product typeassociated with the action may be the particular product type. Further,in some examples, the information provided by Step 1508 may be based onthe product type associated with the action. In one example, theinformation provided by Step 1508 may include an indication of theproduct type (for example, textual indication, a picture of a product ofthe product type, a barcode associated with the product type, and soforth). In one example, in response to a first product type associatedwith of the action, Step 1508 may provide first information, and inresponse to a second product type associated with of the action, Step1508 may provide second information, the second information may differfrom the first information. In one example, in response to a firstproduct type associated with of the action, Step 1508 may provide theinformation, and in response to a second product type associated with ofthe action, Step 1508 may forgo providing the information.

In some examples, detecting the action performed in the retailenvironment by Step 1506 may further include determining a quantity ofproducts associated with the action. For example, Step 1506 may use aregression model to determine the quantity of products associated withthe action. In another example, Step 1506 may analyze the infrared datareceived by Step 1502 and the at least one image received by Step 1504(for example using the classification mode, using a machine learningmodel trained using training examples to determine quantity of productsassociated with actions from records including both infrared data andimages, using an artificial neural network, and so forth) to determinethe quantity of products associated with the action. In one example, theaction may include at least one of picking, placing and moving at leastone product, and the quantity of products associated with the action maybe the quantity of products picked, placed and/or moved in the action.In one example, the action may include at least one of placing andremoving a promotion sign, and the quantity of products associated withthe action may be a quantity of products indicated in the promotionsign. Further, in some examples, the information provided by Step 1508may be based on the quantity of products associated with the action. Inone example, the information provided by Step 1508 may include anindication of the quantity of products associated with the action. Inone example, in response to a first quantity of products associated withthe action, Step 1508 may provide first information, and in response toa second quantity of products associated with the action, Step 1508 mayprovide second information, the second information may differ from thefirst information. In one example, in response to a first quantity ofproducts associated with the action, Step 1508 may provide theinformation, and in response to a second quantity of products associatedwith the action, Step 1508 may forgo providing the information.

In some examples, the infrared data received by Step 1502 may include atime series of samples captured using the one or more infrared sensorsat different points in time. In some examples, Step 1504 may furthercomprise analyzing the time series of the samples captured using the oneor more infrared sensors at the different points in time to select theat least one image of a plurality of images. For example, in response toa first result of the analysis of the time series of samples, Step 1504may selected a first subgroup of the plurality of images, and inresponse to a second result of the analysis of the time series ofsamples, Step 1504 may selected a second subgroup of the plurality ofimages, the second subgroup may differ from the first subgroup. Inanother example, Step 1504 may analyze the time series of the samplescaptured using the one or more infrared sensors at the different pointsin time to select a particular point in time (for example, a point intime corresponding to an extremum of the samples, a point in timecorresponding to a sample satisfying a particular criterion, and soforth), each image of the plurality of images may correspond to adifferent point in time (for example, based on the capturing time of theimage), and Step 1504 may select the image of the plurality of imagescorresponding to the particular point in time (or corresponding to apoint in time nearest to the particular point in time of the points intime corresponding to the plurality of images).

In some examples, Step 1506 may calculate a convolution of at least partof the at least one image to obtain a value of the calculatedconvolution. Further, in some examples, Step 1506 may analyze theinfrared data to determine a wavelength associated with the infrareddata. For example, the wavelength associated with the infrared data maybe the most prominent wavelength in the infrared data, the mostprominent wavelength in a selected range of wavelengths in the infrareddata, the second most prominent wavelength in the infrared data, and soforth. In one example, in response to a first combination of the valueof the calculated convolution and the wavelength associated with theinfrared data, Step 1506 may detect the action performed in the retailenvironment, and in response to a second combination of the value of thecalculated convolution and the wavelength associated with the infrareddata, Step 1506 may forgo the detection of the action performed in theretail environment. In another example, in response to a firstcombination of the value of the calculated convolution and thewavelength associated with the infrared data, Step 1506 may determine afirst type of the action performed in the retail environment, and inresponse to a second combination of the value of the calculatedconvolution and the wavelength associated with the infrared data, Step1506 may determine a second type of the action performed in the retailenvironment, the second type may differ from the first type.

In some examples, systems, methods and computer-readable media for usingvibration data analysis and image analysis for robust action recognitionin retail environment are provided.

FIG. 16 provides a flowchart of an exemplary method 1600 for usingvibration data analysis and image analysis for robust action recognitionin retail environment, consistent with the present disclosure. In thisexample, method 1600 may comprise: receiving vibration data capturedusing one or more vibration sensors mounted to a shelving unit includingat least one retail shelf (Step 1602); receiving at least one imagecaptured using at least one image sensor from a retail environmentincluding the shelving unit (Step 1604); analyzing the vibration dataand the at least one image to detect an action performed in the retailenvironment (Step 1606); and providing information based on the detectedaction (Step 1608).

In some examples, Step 1602 may comprise receiving vibration datacaptured using one or more vibration sensors mounted to a shelving unitincluding at least one retail shelf. For example, receiving thevibration data by Step 1602 may comprise at least one of reading thevibration data, receiving the vibration data from an external device(for example, using a digital communication device), capturing thevibration data using the one or more vibration sensors mounted to ashelving unit including at least one retail shelf, and so forth. In someexamples, the one or more vibration sensors may be at least one ofactive vibration sensors, passive vibration sensors, thermal vibrationsensors, pyroelectric vibration sensors, thermoelectric vibrationsensors, photoconductive vibration sensors and photovoltaic vibrationsensors. In one example, the one or more vibration sensors may be one ormore passive vibration sensors. In some examples, the one or morevibration sensors may be one or more vibration sensors positioned belowa second retail shelf. In one example, the second retail shelf may bepositioned above the retail shelf. For example, the one or morevibration sensors may be one or more vibration sensors mounted to thesecond retail shelf, mounted to a surface (for example, of a wall, of arack, etc.) connecting the second retail shelf and the retail shelf, andso forth. In some examples, the one or more vibration sensors may be oneor more vibration sensors mounted to a second retail shelf. In oneexample, the second retail shelf may be positioned on an opposite sideof an aisle from the retail shelf.

In some examples, Step 1604 may comprise receiving at least one imagecaptured using at least one image sensor from a retail environment (forexample, a retail environment including the shelving unit of Step 1602),for example as described above. In some examples, receiving at least oneimage by Step 1604 may comprise at least one of reading the at least oneimage, receiving the at least one image from an external device (forexample, using a digital communication device), capturing the at leastone image using the at least one image sensor from the retailenvironment, and so forth. In some examples, the at least one imagesensor of Step 504 may be at least one image sensor mounted to a secondretail shelf, for example as illustrated in FIG. 4A, FIG. 6A and FIG.6B. In some examples, the at least one image sensor of Step 1604 may beat least one image sensor mounted to an image capturing robot (forexample, a wheeled robot such as capturing device 125G, a legged robot,a snake-like robot, and so forth). In some examples, the at least oneimage sensor of Step 1604 may be at least one image sensor mounted to aceiling of a retail store. In some examples, the at least one imagesensor of Step 1604 may be part of a personal mobile device, such ascapturing device 125D. In some examples, the at least one image receivedby Step 1604 may include at least one three-dimensional image (such as arange image, a stereo image, a depth image, a three-dimensional array ofvoxels, and so forth).

In some examples, Step 1606 may comprise analyzing the vibration datareceived by Step 1602 and the at least one image received by Step 1604to detect an action performed in the retail environment. In someexamples, the action may include at least one of picking a product froma retail shelf, placing a product on a retail shelf and moving a producton a retail shelf. Some other non-limiting examples of such action mayinclude placing a label (such as a shelf label), remoting a label (suchas a shelf label), placing a promotional sign, removing a promotionsign, changing a price, cleaning, restocking, rearranging products, andso forth. In some examples, a machine learning model may be trainedusing training examples to detect actions from vibration data andimages. An example of such training example may include a samplevibration data and a sample image, together with a label indicatingwhether the sample vibration data and the sample image corresponds to anaction performed in an environment. In one example, Step 1606 may usethe trained machine learning model to analyze the vibration datareceived by Step 1602 and the at least one image received by Step 1604to detect the action performed in the retail environment. In someexample, Step 1606 may use an artificial neural network to analyze thevibration data received by Step 1602 and the at least one image receivedby Step 1604 to detect the action performed in the retail environment.

In some examples, Step 1606 may calculate a convolution of at least partof the at least one image received by Step 1604 to obtain a value of thecalculated convolution, and may use the value of the calculatedconvolution to analyze the vibration data received by Step 1602 todetect the action performed in the retail environment. For example, Step1606 may analyze the vibration data received by Step 1602 using aparametric model to detect the action performed in the retailenvironment, and the parameter may be selected based on the value of thecalculated convolution. In another example, in response to a first valueof the calculated convolution, Step 1606 may analyze the vibration datareceived by Step 1602 using a first analysis step to detect the actionperformed in the retail environment, and in response to a second valueof the calculated convolution, Step 1606 may analyze the vibration datareceived by Step 1602 using a second analysis step to detect the actionperformed in the retail environment, the second analysis step may differfrom the first analysis step.

In some examples, Step 1606 may calculate a convolution of at least partof the vibration data received by Step 1602 to obtain a value of thecalculated convolution, and may use the value of the calculatedconvolution to analyze the at least one image received by Step 1604 todetect the action performed in the retail environment. For example, Step1606 may analyze at least one image received by Step 1604 using aparametric model to detect the action performed in the retailenvironment, and the parameter may be selected based on the value of thecalculated convolution. In another example, in response to a first valueof the calculated convolution, Step 1606 may analyze the at least oneimage received by Step 1604 using a first analysis step to detect theaction performed in the retail environment, and in response to a secondvalue of the calculated convolution, Step 1606 may analyze the at leastone image received by Step 1604 using a second analysis step to detectthe action performed in the retail environment, the second analysis stepmay differ from the first analysis step.

In some examples, the vibration data received by Step 1602 may include atime series of samples captured using the one or more vibration sensorsat different points in time. In some examples, Step 1606 may compare twosamples of the time series of samples, and may use a result of thecomparison to analyze the at least one image received by Step 1604 todetect the action performed in the retail environment. For example, Step1606 may analyze at least one image received by Step 1604 using aparametric model to detect the action performed in the retailenvironment, and the parameter may be selected based on the result ofthe comparison. In another example, in response to a first result of thecomparison, Step 1606 may analyze the at least one image received byStep 1604 using a first analysis step to detect the action performed inthe retail environment, and in response to a second result of thecomparison, Step 1606 may analyze the at least one image received byStep 1604 using a second analysis step to detect the action performed inthe retail environment, the second analysis step may differ from thefirst analysis step.

In some examples, the at least one image received by Step 1604 mayinclude a plurality of frames of a video captured using the at least oneimage sensor. In some examples, Step 1606 may compare two frames of theplurality of frames, and may use a result of the comparison to analyzethe vibration data received by Step 1602 to detect the action performedin the retail environment. For example, Step 1606 may analyze thevibration data received by Step 1602 using a parametric model to detectthe action performed in the retail environment, and the parameter may beselected based on the result of the comparison. In another example, inresponse to a first result of the comparison, Step 1606 may analyze thevibration data received by Step 1602 using a first analysis step todetect the action performed in the retail environment, and in responseto a second result of the comparison, Step 1606 may analyze thevibration data received by Step 1602 using a second analysis step todetect the action performed in the retail environment, the secondanalysis step may differ from the first analysis step.

In some examples, Step 1606 may analyzing the vibration data received byStep 1602 to select a portion of the at least one image received by Step1604. For example, in response to a first vibration data received byStep 1602, Step 1604 may select a first portion of the at least oneimage received by Step 1604, and in response to a second vibration datareceived by Step 1602, Step 1604 may select a second portion of the atleast one image received by Step 1604, the second portion may differfrom the first portion. In another example, the vibration data receivedby Step 1602 may include spatial properties, and Step 1606 may selectthe portion of the at least one image received by Step 1604 based on thespatial properties. For example, the spatial properties may include anindication of a region in the retail environment, and Step 1606 mayselect a portion of the at least one image received by Step 1604corresponding to the indicated region of the retail environment.Further, in some examples, Step 1606 may analyzing the selected portionof the at least one image to detect the action performed in the retailenvironment, for example using the image analysis described above.

In some examples, Step 1606 may comprise analyzing the vibration datareceived by Step 1602 to attempt to detect the action performed in theretail environment, for example using a pattern recognition algorithm.In some examples, for example in response to a failure of the attempt tosuccessfully detect the action, Step 1606 may analyze the at least oneimage received by Step 1604 to detect the action performed in the retailenvironment, for example using a visual action recognition algorithm. Inone example, for example in response to a failure to successfully detectthe action, method 1600 may trigger the capturing of the at least oneimage using the at least one image sensor. In one example, the failureto successfully detect the action may be a failure to successfullydetect the action at a confidence level higher than a selectedthreshold. In another example, the failure to successfully detect theaction may be a failure to determine at least one aspect of the action.Some non-limiting examples of such aspect may include at least one of atype of the action, a product type associated with the action, and aquantity of products associated with the action.

In some examples, Step 1608 may comprise providing information based onthe action detected by Step 1606. For example, providing the informationbased on the action detected by Step 1606 may comprise at least one ofstoring the information in memory, transmitting the information to anexternal device, providing the information to a user (for example,visually, audibly, textually, through a user interface, etc.), and soforth.

In some examples, detecting the action performed in the retailenvironment by Step 1606 may further include recognizing a type of theaction. For example, Step 1606 may use a classification model toclassify the action to a particular class of a plurality of alternativeclasses, each class of the plurality of alternative classes maycorrespond to a different type of action. In another example, Step 1606may analyze the vibration data received by Step 1602 and the at leastone image received by Step 1604 (for example using the classificationmode, using a machine learning model trained using training examples torecognize types of actions from records including both vibration dataand images, using an artificial neural network, and so forth) torecognize the type of the action. Some non-limiting examples of suchtypes of actions may include picking an item, picking a product, placingan item, placing a product, moving an item, moving a product, placing alabel (such as a shelf label), remoting a label (such as a shelf label),placing a promotional sign, removing a promotion sign, changing a price,cleaning, restocking, rearranging products, and so forth. Further, insome examples, the information provided by Step 1608 may be based on thetype of the action. In one example, the information provided by Step1608 may include an indication of the type of the action. In oneexample, in response to a first type of the action, Step 1608 mayprovide first information, and in response to a second type of theaction, Step 1608 may provide second information, the second informationmay differ from the first information. In one example, in response to afirst type of the action, Step 1608 may provide the information, and inresponse to a second type of the action, Step 1608 may forgo providingthe information.

In some examples, detecting the action performed in the retailenvironment by Step 1606 may further include identifying a product typeassociated with the action. For example, Step 1606 may use aclassification model to classify the action to a particular class of aplurality of alternative classes, each class of the plurality ofalternative classes may correspond to a different product type. Inanother example, Step 1606 may analyze the vibration data received byStep 1602 and the at least one image received by Step 1604 (for exampleusing the classification mode, using a machine learning model trainedusing training examples to identify product types of products associatedwith actions from records including both vibration data and images,using an artificial neural network, and so forth) to identify theproduct type. In one example, the action may include at least one ofpicking, placing and moving a product, and the product type associatedwith the action may be a product type of the product. In one example,the action may include at least one of placing and remoting a label(such as a shelf label), and the product type associated with the actionmay be a product type indicated by the label (for example, by textprinted on the label, by a logo on the label, by a picture on the label,by a visual code on the label, and so forth). In one example, the actionmay include at least one of placing and removing a promotion sign, andthe product type associated with the action may be a product typeassociated with the promotion sign. In one example, the action mayinclude changing a price of products of a particular product type, andthe product type associated with the action may be the particularproduct type. Further, in some examples, the information provided byStep 1608 may be based on the product type associated with the action.In one example, the information provided by Step 1608 may include anindication of the product type (for example, textual indication, apicture of a product of the product type, a barcode associated with theproduct type, and so forth). In one example, in response to a firstproduct type associated with of the action, Step 1608 may provide firstinformation, and in response to a second product type associated with ofthe action, Step 1608 may provide second information, the secondinformation may differ from the first information. In one example, inresponse to a first product type associated with of the action, Step1608 may provide the information, and in response to a second producttype associated with of the action, Step 1608 may forgo providing theinformation.

In some examples, detecting the action performed in the retailenvironment by Step 1606 may further include determining a quantity ofproducts associated with the action. For example, Step 1606 may use aregression model to determine the quantity of products associated withthe action. In another example, Step 1606 may analyze the vibration datareceived by Step 1602 and the at least one image received by Step 1604(for example using the classification mode, using a machine learningmodel trained using training examples to determine quantity of productsassociated with actions from records including both vibration data andimages, using an artificial neural network, and so forth) to determinethe quantity of products associated with the action. In one example, theaction may include at least one of picking, placing and moving at leastone product, and the quantity of products associated with the action maybe the quantity of products picked, placed and/or moved in the action.In one example, the action may include at least one of placing andremoving a promotion sign, and the quantity of products associated withthe action may be a quantity of products indicated in the promotionsign. Further, in some examples, the information provided by Step 1608may be based on the quantity of products associated with the action. Inone example, the information provided by Step 1608 may include anindication of the quantity of products associated with the action. Inone example, in response to a first quantity of products associated withthe action, Step 1608 may provide first information, and in response toa second quantity of products associated with the action, Step 1608 mayprovide second information, the second information may differ from thefirst information. In one example, in response to a first quantity ofproducts associated with the action, Step 1608 may provide theinformation, and in response to a second quantity of products associatedwith the action, Step 1608 may forgo providing the information.

In some examples, the vibration data received by Step 1602 may include atime series of samples captured using the one or more vibration sensorsat different points in time. In some examples, Step 1604 may furthercomprise analyzing the time series of the samples captured using the oneor more vibration sensors at the different points in time to select theat least one image of a plurality of images. For example, in response toa first result of the analysis of the time series of samples, Step 1604may selected a first subgroup of the plurality of images, and inresponse to a second result of the analysis of the time series ofsamples, Step 1604 may selected a second subgroup of the plurality ofimages, the second subgroup may differ from the first subgroup. Inanother example, Step 1604 may analyze the time series of the samplescaptured using the one or more vibration sensors at the different pointsin time to select a particular point in time (for example, a point intime corresponding to an extremum of the samples, a point in timecorresponding to a sample satisfying a particular criterion, and soforth), each image of the plurality of images may correspond to adifferent point in time (for example, based on the capturing time of theimage), and Step 1604 may select the image of the plurality of imagescorresponding to the particular point in time (or corresponding to apoint in time nearest to the particular point in time of the points intime corresponding to the plurality of images).

In some example, Step 1606 may calculate a convolution of at least partof the at least one image to obtain a value of the calculatedconvolution. Further, Step 1606 may analyze the vibration data todetermine a frequency associated with the vibration data, for exampleusing spectral analysis of the vibration data, using narrow-bandfrequency analysis, and so forth. Some non-limiting examples of suchdetermined frequency associated with the vibration data may include aprominent periodic frequency, a prominent frequency in a selected rangeof frequencies, the second most prominent periodic frequency, and soforth. In one example, in response to a first combination of the valueof the calculated convolution and the frequency associated with thevibration data, Step 1606 may detect the action performed in the retailenvironment, and in response to a second combination of the value of thecalculated convolution and the frequency associated with the vibrationdata, Step 1606 may forgo the detection of the action performed in theretail environment. In another example, in response to a firstcombination of the value of the calculated convolution and the frequencyassociated with the vibration data, Step 1606 may determine a first typeof the action performed in the retail environment, and in response to asecond combination of the value of the calculated convolution and thefrequency associated with the vibration data, Step 1606 may determine asecond type of the action performed in the retail environment, thesecond type may differ from the first type.

The foregoing description has been presented for purposes ofillustration. It is not exhaustive and is not limited to the preciseforms or embodiments disclosed. Modifications and adaptations will beapparent to those skilled in the art from consideration of thespecification and practice of the disclosed embodiments. Additionally,although aspects of the disclosed embodiments are described as beingstored in memory, one skilled in the art will appreciate that theseaspects can also be stored on other types of computer readable media,such as secondary storage devices, for example, hard disks or CD ROM, orother forms of RAM or ROM, USB media, DVD, Blu-ray, 4K Ultra HD Blu-ray,or other optical drive media.

Computer programs based on the written description and disclosed methodsare within the skill of an experienced developer. The various programsor program modules can be created using any of the techniques known toone skilled in the art or can be designed in connection with existingsoftware. For example, program sections or program modules can bedesigned in or by means of .Net Framework, .Net Compact Framework (andrelated languages, such as Visual Basic, C, etc.), Java, C++,Objective-C, HTML, HTML/AJAX combinations, XML, or HTML with includedJava applets.

Moreover, while illustrative embodiments have been described herein, thescope of any and all embodiments having equivalent elements,modifications, omissions, combinations (e.g., of aspects across variousembodiments), adaptations and/or alterations as would be appreciated bythose skilled in the art based on the present disclosure. Thelimitations in the claims are to be interpreted broadly based on thelanguage employed in the claims and not limited to examples described inthe present specification or during the prosecution of the application.The examples are to be construed as non-exclusive. Furthermore, thesteps of the disclosed methods may be modified in any manner, includingby reordering steps and/or inserting or deleting steps. It is intended,therefore, that the specification and examples be considered asillustrative only, with a true scope and spirit being indicated by thefollowing claims and their full scope of equivalents.

1.-80. (canceled)
 81. A non-transitory computer-readable mediumincluding instructions that when executed by a processor cause theprocessor to perform a method for using vibration data analysis andimage analysis for robust action recognition in retail environment, themethod comprising: receiving vibration data captured using one or morevibration sensors mounted to a shelving unit including at least oneretail shelf; receiving at least one image captured using at least oneimage sensor from a retail environment including the shelving unit;analyzing the vibration data and the at least one image to detect anaction performed in the retail environment; and providing informationbased on the detected action.
 82. The non-transitory computer-readablemedium of claim 81, wherein the action includes at least one of pickinga product from a retail shelf, placing a product on a retail shelf andmoving a product on a retail shelf.
 83. The non-transitorycomputer-readable medium of claim 81, wherein detecting the actionperformed in the retail environment includes recognizing a type of theaction.
 84. The non-transitory computer-readable medium of claim 81,wherein detecting the action performed in the retail environmentincludes at least one of identifying a product type associated with theaction and determining a quantity of products associated with theaction.
 85. The non-transitory computer-readable medium of claim 81,wherein the method further comprises: calculating a convolution of atleast part of the at least one image to obtain a value of the calculatedconvolution; and using the value of the calculated convolution toanalyze the vibration data to detect the action performed in the retailenvironment.
 86. The non-transitory computer-readable medium of claim81, wherein the method further comprises: calculating a convolution ofat least part of the vibration data to obtain a value of the calculatedconvolution; and using the value of the calculated convolution toanalyze the at least one image to detect the action performed in theretail environment.
 87. The non-transitory computer-readable medium ofclaim 81, wherein the method further comprises: calculating aconvolution of at least part of the at least one image to obtain a valueof the calculated convolution; analyzing the vibration data to determinea frequency associated with the vibration data; in response to a firstcombination of the value of the calculated convolution and the frequencyassociated with the vibration data, detecting the action performed inthe retail environment; and in response to a second combination of thevalue of the calculated convolution and the frequency associated withthe vibration data, forgoing the detection of the action performed inthe retail environment.
 88. The non-transitory computer-readable mediumof claim 81, wherein the vibration data includes a time series ofsamples captured using the one or more vibration sensors at differentpoints in time.
 89. The non-transitory computer-readable medium of claim88, wherein the method further comprises analyzing the time series ofsamples to select the at least one image of a plurality of images. 90.The non-transitory computer-readable medium of claim 88, wherein themethod further comprises: comparing two samples of the time series ofsamples; and using a result of the comparison to analyze the at leastone image to detect the action performed in the retail environment. 91.The non-transitory computer-readable medium of claim 81, wherein the atleast one image includes a plurality of frames of a video captured usingthe at least one image sensor.
 92. The non-transitory computer-readablemedium of claim 91, wherein the method further comprises: comparing twoframes of the plurality of frames; and using a result of the comparisonto analyze the vibration data to detect the action performed in theretail environment.
 93. The non-transitory computer-readable medium ofclaim 81, wherein the at least one image includes at least onethree-dimensional image.
 94. The non-transitory computer-readable mediumof claim 81, wherein the method further comprises: analyzing thevibration data to select a portion of the at least one image; andanalyzing the selected portion of the at least one image to detect theaction performed in the retail environment.
 95. The non-transitorycomputer-readable medium of claim 81, wherein the method furthercomprises: analyzing the vibration data to attempt to detect the actionperformed in the retail environment; and in response to a failure of theattempt to successfully detect the action, analyzing the at least oneimage to detect the action performed in the retail environment.
 96. Thenon-transitory computer-readable medium of claim 95, wherein the failureto successfully detect the action is a failure to successfully detectthe action at a confidence level higher than a selected threshold. 97.The non-transitory computer-readable medium of claim 95, wherein thefailure to successfully detect the action is a failure to determine atleast one aspect of the action.
 98. The non-transitory computer-readablemedium of claim 95, wherein the method further comprises, in response toa failure to successfully detect the action, triggering the capturing ofthe at least one image using the at least one image sensor.
 99. A systemfor using vibration data analysis and image analysis for robust actionrecognition in retail environment, the system comprising: at least oneprocessing unit configured to: receive vibration data captured using oneor more vibration sensors mounted to a shelving unit including at leastone retail shelf; receive at least one image captured using at least oneimage sensor from a retail environment including the shelving unit;analyze the vibration data and the at least one image to detect anaction performed in the retail environment; and provide informationbased on the detected action.
 100. A method for using vibration dataanalysis and image analysis for robust action recognition in retailenvironment, the method comprising: receiving vibration data capturedusing one or more vibration sensors mounted to a shelving unit includingat least one retail shelf; receiving at least one image captured usingat least one image sensor from a retail environment including theshelving unit; analyzing the vibration data and the at least one imageto detect an action performed in the retail environment; and providinginformation based on the detected action.